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CHAPTER  1 : 
Introduction 


Analysts  are  frequently  confronted  with  time-series  data.  A  simple  form  often  encountered  is 
magnitude  (or  count)  and  time  frame,  whether  the  data  is  number  of  e-mails  sent,  number  of 
cell  phones  called,  purchases  made  by  volume  or  cost,  or  a  variety  of  other  time  derived  data. 
Studying  the  temporal  dimension  of  data  allows  analysts  more  opportunities  to  find  relational 
ties  and  trends  in  data,  classify  or  group  like  activity,  and  even  help  narrow  the  search  space  of 
massively  complex  and  large  datasets. 

There  are  three  basic  methods  of  finding  these  patterns  today.  First,  by  hand — requiring  heavy 
human  interaction.  It  is  slow,  but  can  be  aided  by  software  visualization  tools.  Second,  su¬ 
pervised  learning — requiring  both  human  and  machine  interaction.  If  done  right,  this  approach 
can  combine  the  strengths  of  both  human  and  machine.  Third,  fully  automated — requiring  no 
human  interaction.  However,  automatically  generated  rules  can  be  nonsensical  and  of  limited 
value.  As  computer  speeds  increase,  there  is  hope  that  someday  computers  might  be  able  to 
“think”  like  a  human.  While  the  idea  of  thinking  machines  is  the  goal  of  many  good  science 
fiction  books,  the  state  of  the  art  in  artificial  intelligence  is  well  below  the  sentient  mark  and 
looks  to  remain  there  for  some  time.  That  is  why  supervised  learning  methods  are  the  most 
practical  and  relevant  for  today’s  data  mining  efforts. 

This  thesis  presents  a  new  supervised  learning  approach  called  the  Rule  Based  Intuition  (RBI) 
system.  The  RBI  methodology  can  evaluate  time-series  data  by  finding  the  best  fitting  rule, 
from  a  repository  of  known  rules,  to  quickly  infer  information  about  the  data.  Currently,  the  best 
scientists  can  do  is  to  optimize  and  combine  the  strengths  of  both  human  and  computer  to  help 
find  the  needed  information.  This  concept  is  the  idea  behind  RBI.  The  RBI  method  attempts 
to  maximize  the  best  capabilities  of  both  humans  and  machines.  By  using  known  temporal 
patterns,  analysts  can  combine  the  power  and  speed  of  computers  with  their  own  knowledge 
to  reduce  the  necessary  search  space,  find  relevant  information,  and  identify  necessary  causal 
relationships. 

Finding  temporal  patterns  is  a  very  difficult  problem,  although  it  seems  humans  are  good  at 
this  type  of  pattern  recognition.  For  example,  when  a  large  company  with  several  thousand 
personnel  work  overtime  on  a  time  critical  project,  the  number  of  pizzas  to  be  delivered  to 
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the  company  building  drastically  increases.  This  pattern  is  repeated  several  times  throughout 
the  year.  A  competitor  notices  that  immediately  after  pizza  sales  increase  at  the  company,  it 
announces  a  hostile  takeover.  Imagine  instead  the  company  is  a  news  agency  and  pizza  orders 
increase  just  before  a  major  breaking  news  story  or  perhaps  the  company  is  the  Pentagon  and 
this  trend  is  a  precursor  to  military  operations.  In  1990,  Time  magazine  published  “And  Bomb 
The  Anchovies”  [1],  which  correlates  the  purchases  of  pizza  at  the  Pentagon  to  Iraq’s  invasion 
of  Kuwait.  If  a  local  pizza  delivery  person  can  make  these  causal  connections,  what  can  trained 
analysts  with  better  tools  do?1 

To  the  author’s  knowledge,  the  RBI  methodology  is  a  new  application  to  data  mining.  In  the 
extensive  article  reviews  in  Chapter  2,  no  one  has  tried  this  approach  to  data  discovery.  The 
RBI  methodology  is  designed  to  be  modular  and  extensible,  as  the  rules  can  be  developed  and 
stored  in  a  database  for  shared  access.  The  modular  design  is  well  suited  for  remote  analysis  and 
allows  knowledge  experts  to  develop  rules  while  lesser  trained  field  collectors  can  automatically 
correlate  data  with  reach-back  to  the  experts.  This  approach  can  be  developed  for  analysts 
viewing  large  sets  of  data  or  locally  captured  data  providing  data  correlation  of  users  temporal 
activity. 

Using  the  RBI  methodology,  this  thesis  investigates  the  following  questions:  If  we  already  know 
a  temporal-spatial  pattern,  can  we  use  what  we  know  to  help  us  find  what  we  need?  Is  there  a 
fast,  proven  method  to  take  our  temporal  knowledge,  evaluate  it,  and  apply  it  to  data  we  have 
not  seen  before  to  tell  us  something  new?  What  DoD  applications  might  this  approach  have? 

To  investigate  these  questions,  this  thesis  presents  the  simple  RBI  framework  written  in  the 
Python  programming  language  for  creating  temporal  rules — which  we  define  as  simple  boolean 
functions  which,  when  given  an  event  located  at  a  specific  time,  will  return  either  True  (mean¬ 
ing  that  the  event  is  covered  by  the  rule)  or  False  (meaning  that  the  event  is  not  covered  by 
the  rule).  Each  rule  is  evaluated  using  a  Poisson  linear  regression  to  determine  which  rule  or 
rules  best  fit  each  dataset.  This  framework  is  used  to  create  137  rules.  This  set  of  rules  is  then 
used  to  process  three  data  sets: 


•  The  number  of  calls  during  the  1999  calendar  year  to  the  Automated  Computer  Time 
Service  (ACTS)  operated  by  the  National  Institute  of  Standards  and  Technology  [3J. 


'For  a  more  thorough  review  of  the  history  of  and  the  application  to  military/intelligence  security,  read  “Intro¬ 
ducing  Traffic  Analysis”  |2J. 
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•  The  number  of  calls  each  day  to  a  bank  call  center  in  Israel  during  the  1999  calendar  year 
L4J. 

•  The  number  of  terrorist  attacks  on  each  day  of  the  1999  calendar  year,  as  tracked  by  the 
Global  Terrorism  Database. 

RBI  is  a  new  temporal  analysis  approach  and  is  applicable  in  several  areas  of  research,  intelli¬ 
gence  collection,  information  operations  and  user  classification.  Its  simple  modular  design  and 
implementation  lend  itself  as  a  new  addition  to  the  analysts  tool  set. 

1.1  Motivation 

Much  of  today’s  event-based  research  is  geared  toward  finding  temporal  patterns,  identifying 
change  events,  and  discovering  useful  repeating  patterns.  Another  facet  of  temporal  data  mining 
is  data  discovery,  the  ability  to  find  relevant  information  by  looking  at  how  and  when  events 
occur  and  place  them  in  the  proper  context. 

The  amount  of  digital  data  has  increased  exponentially  in  the  last  20  years,  causing  an  exploita¬ 
tion  of  ubiquitous  and  interconnected  information.  As  the  sea  of  information  grows,  agencies 
and  businesses  struggle  to  quickly  find  repeatable  patterns  to  identify  causal  relations  with  sig¬ 
nificant  events  of  interest.  Many  current  methods  are  computationally  expensive,  can  only  be 
done  in  large  database  warehouses,  or  requires  unique  expertise  to  find  the  data  and  their  con¬ 
nections.  The  increased  demands  placed  on  analysts  to  find  useful,  relevant  information  make 
automated  information  retrieval  a  requirement.  Temporal  analysis  adds  an  additional  capability 
not  widely  available  to  analysts. 

Many  intelligence  analysts  have  calendars  with  anniversary  dates  and  workflow  wheels  col¬ 
lected  over  years  of  dedicated  observation.  However,  this  is  a  very  manually  intensive  pro¬ 
cess.  The  data  has  been  collected  but  there  is  no  method  to  automate  this  extensive  temporal 
expertise — until  now.  The  RBI  methodology  is  the  tool  that  can  bridge  this  capability  gap. 

1.2  Temporal  Analysis 

In  order  to  understand  these  concepts  better,  we  should  begin  with  a  rudimentary  understand¬ 
ing  of  the  requirements  of  temporal  analysis  and  different  philosophical  and  computational 
approaches  to  the  understanding  of  time.  Any  effective  temporal  systems  should  have  the 
following  criteria: 
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•  Must  allow  for  imprecise  measurements  of  time  (IMT).  For  example,  computer  generated 
logs  are  often  incorrect  by  hours  or  sometimes  days.  By  looking  specifically  at  just  dates 
and  time,  one  might  miss  temporal  relations. 

•  Must  allow  for  imprecision  in  data  (ID).  One  might  not  know  the  exact  relationship  be¬ 
tween  two  events,  but  the  system  should  be  robust  enough  to  understand  or  determine 
partial  relations. 

•  Conform  to  the  right  degree  of  time  (RDT).  Some  events  happen  in  years  and  others 
happen  in  hours  or  even  microseconds  [5J. 

With  these  criteria  in  place,  we  will  evaluate  the  different  philosophical  views  of  time  and  how 
they  relate  to  computational  implementation.  Then  grade  each  of  the  approaches  to  the  criteria 
on  a  simple  (+)  or  (-)  system  and  provide  capabilities  and  limitations  of  each  approach.  A  (+) 
sign  indicates  that  the  criterion  is  easy  to  implement  in  that  view  of  time  while  a  (-)  indicates 
not  a  failure  of  the  view  of  time,  but  rather  is  difficult  to  implement  in  terms  of  complexity,  cost 
of  time,  or  cost  of  resources. 

1.2.1  Different  Views  of  Time 

There  have  been  centuries  of  research  on  the  topic  of  time  from  philosophical  to  modern  com¬ 
putational.  Many  brilliant  minds  have  struggled  with  different  aspects  of  time.  Understanding 
the  different  views  of  time  and  their  origins  is  important.  There  are  three  fundamental  views  of 
time:  one  is  that  time  is  moving  or  flowing  with  events  in  the  past,  present  and  future  ( Date /- 
Time  Line  Systems );  another  is  that  time  is  based  on  causality  or  observed  events  ( State  Space 
and  Formal  Methods  Modeling)',  the  third  is  that  time  is  based  on  perceived  instances  defined 
by  relative  observation  ( Relative  Sequential  Chaining).  Each  of  these  philosophical  views  of 
time  affect  the  method  used  in  finding  the  data  and  are  fundamental  for  analysts  and  computer 
scientists  to  understand  the  capabilities  and  limitations  of  the  different  implementations  of  tem¬ 
poral  analysis. 

Aristotle  and  Newton  (Date/Time  Line  Systems)  This  view  is  often  called  the  classical  view 
of  time.  Aristotle  viewed  time  as  a  magnitude  of  movement.  Newton  framed  time  in  the  physi¬ 
cal  world  much  in  the  same  way  as  Aristotle.  One  of  Newton’s  contributions  to  time  is  the  idea 
that  time  is  flowing.  An  example  would  be  as  a  man  walks  across  the  room,  time  flows  as  he 
moves.  In  this  view  of  time  events  are  temporally  anchored  in  the  physical  world.  This  view  is 
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similar  to  looking  at  a  calendar  and  events  happen  on  Wednesday  or  the  event  happened  after 
December  15,  1993.  This  is  also  called  anchored  time  as  it  is  set  by  a  date  or  dates  with  the  first 
instance  being  the  anchor. 

For  example,  computer  system  clocks  use  an  anchored  time  to  determine  the  “time.”  In  this 
case,  time  is  a  set  of  counts  in  seconds  where  t  marks  a  count  t  =  (1,  2,  3, ...  n)  from  an 
arbitrary  date  January  1,  1970,  at  00:00  in  UNIX  systems.  In  this  example  January  8,  2010, 
at  08:06  is  1,262,937,960  seconds  from  January  1,  1970.  The  computer  counts  the  seconds 
from  the  anchored  date  and  then  displays  local  time  of  January  8,  2019.  This  can  be  thought  of 
outside  of  the  observation  or  occurrence  and  is  used  in  measure  or  relate  events.  This  is  helpful 
when  dealing  with  multiple  timelines  because  they  can  be  compared  together  easily. 

This  temporal  view  focuses  on  building  timelines  from  instances  of  specific  dates/times.  This 
approach  is  useful  and  easy  for  computers  as  the  date/time  becomes  the  reference.  This  is  a  good 
model  for  work  flow  analysis  (the  study  of  when  and  in  what  order  people  do  work).  Workflow 
is  often  connected  to  date/time  like  sunrise,  sunset,  and  holidays.  This  date  line  approach  is  not 
flexible  as  events  may  not  be  able  to  be  set  to  a  precise  date.  In  order  to  make  it  more  flexible, 
these  systems  can  define  time  in  terms  of  a  window,  which  adds  complexity  and  ambiguity  to 
the  system.  This  model  seems  to  do  a  poor  job  of  capturing  relative  temporal  information  when 
window  sizes  overlap.  This  overlapping  leads  to  greater  complexity  and  less  accuracy  (Figure 
1.1 ).  The  RBI  system  presented  in  this  thesis  uses  this  temporal  view,  but  does  not  suffer  from 
this  form  of  complexity,  because  the  uncertainty  of  time  is  dealt  with  in  the  Poisson  Regression 
discussed  in  Chapter  3 , 

Kant  (State  Space  and  Formal  Methods  Modeling)  Kant  explains  the  “experience  is  possible 
only  through  the  representation  of  a  necessary  connection  of  perceptions.”  [6]  He  summarized 
all  perceptions  are  grounded  in  time.  He  goes  on  further  to  say  “all  changes  take  place  according 
to  the  law  of  connection  between  cause  and  effect.”  [6] 

This  type  of  reasoning  can  be  viewed  as  a  state  machine,  with  time  being  the  connector  between 
states.  As  a  connector  to  causal  events,  each  of  the  temporal  ticks  t  happens  when  there  is  a 
transition  from  state  A  (starting  point  of  a  man  in  a  room)  and  state  B  (ending  point  across  the 
room).  Every  discrete  effect  is  modeled  as  a  state  and  every  transition  is  a  unit  of  time. 

This  temporal  view  is  useful  for  simple  problem  solving  tasks  and  does  not  suffer  from  issues 
of  complexity  due  to  IMT  or  ID.  However,  this  approach  has  limitations  as  it  requires  remem- 
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bering,  storing,  and  searching  all  previous  states.  An  important  note  is  that  each  t  might  be  of 
different  length,  which  can  lead  to  difficulties  if  state  transitions  of  two  events  are  happening 
in  parallel,  as  the  temporal  length  of  one  state  transition  does  not  necessarily  match  the  other 
transition  (Figure  1.2).  As  technology  improves,  there  is  hope  that  some  of  these  shortcomings 
can  be  surmounted. 

Einstein  (Relative  Sequential  Chaining)  Einstein  is  famous  for  many  ideas,  but  arguably,  the 
most  important  to  science  are  his  thoughts  on  relativity.  He  describes  time  as: 

Every  reference  body...  has  its  own  particular  time;  unless  we  are  told  the  reference 
body  to  which  the  statement  of  time  refers,  there  is  no  meaning  in  a  statement  of 
the  time  of  an  event.  [7J 

Relative  Sequential  Chaining  captures  relative  temporal  information.  However,  as  the  amount 
of  temporal  information  grows,  the  system  suffers  from  search  and  memory  issues  (Figure  1.3). 
Temporal  logic  helps  elevate  some  of  these  challenges.  Temporal  logic  is  propositional  logic 
with  a  temporal  twist.  An  example  is,  if  A  happens  before  B  and  B  happens  before  C,  then  A 
happens  before  C.  James  Allen  defined  thirteen  basic  possible  temporal  relationships  and  de¬ 
veloped  a  transitive  table,  that  is  a  fundamental  cornerstone  in  relative  temporal  logic  [5J.  This 
has  been  a  growing  field  of  interest  especially  in  the  business  community  as  people  and  organi¬ 
zations  attempt  to  make  personal  interactions  and  market  predictions  more  effective.  Much  of 
this  area  of  study  focuses  on  individuals  and  their  work  and  consumption  activities. 

1.2.2  A  Historical  Example 

Perhaps  one  of  the  most  famous  uses  of  data  line  temporal  analysis  is  that  of  John  Snow,  a 
doctor  in  London  in  1855.  His  work  is  unique  in  that  it  combined  not  only  date  line  temporal 
analysis  but  also  spatial  analysis  with  incredible  effect.  He  describes  the  event  as: 

The  most  terrible  outbreak  of  cholera  which  ever  occurred  in  this  kingdom,  is 
probably  that  which  took  place  in  Broad  Street,  Golden  Square,  and  the  adjoin¬ 
ing  streets,  a  few  weeks  ago.  Within  two  hundred  and  fifty  yards  of  the  spot  where 
Cambridge  Street  joins  Broad  Street,  there  were  upwards  of  five  hundred  fatal  at¬ 
tacks  of  cholera  in  ten  days.  The  mortality  in  this  limited  area  probably  equals  any 
that  was  ever  caused  in  this  country,  even  by  the  plague;  and  it  was  much  more  sud¬ 
den,  as  the  greater  number  of  cases  terminated  in  a  few  hours.  The  mortality  would 
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undoubtedly  have  been  much  greater  had  it  not  been  for  the  flight  of  the  population 
L8J. 


The  method  in  which  cholera  was  spread  was  not  well  understood,  which  is  why  John  Snow’s 
use  of  temporal  and  spatial  mapping  was  so  revolutionary.  He  correlated  the  possible  water 
contamination  to  a  rain  storm,  which  burst  sewer  piping  and  overflowed  into  kitchen  drinking 
water,  which  eventually  contaminated  a  local  water  source.  He  organized  the  event  by  numbers 
of  dead  and  sick  per  day  by  location  and  combined  them  with  drawings  of  public  works  piping 
in  the  city.  With  this,  he  was  quickly  able  to  deduce  the  water  source  as  the  only  possible  source 
of  contamination  and  that  is  was  confined  to  a  single  water  pump.  Upon  physical  investigation 
of  the  water,  he  confirmed  the  contaminated  source  and  had  the  handle  removed  from  the  pump. 
His  quick  deduction  of  the  outbreak  to  a  single  hand  pump  water  supply,  helped  by  temporal 
and  spatial  analysis,  ensured  the  removal  of  the  hand  pump  handle  and  eliminated  risk  to  others 
around  the  pump.  A  few  weeks  after  the  handle  was  removed,  he  was  able  to  return  to  the  area 
for  further  study. 

Snow’s  book  uses  a  map  (Figure  1.4)  to  mark  the  deaths  of  the  people  around  the  pump.  Each 
death  is  shown  as  a  small  black  rectangle.  He  noted  two  areas  that  had  fewer  than  expected 
deaths.  The  workhouse  had  535  people  living  in  it  at  the  time  of  the  outbreak.  Given  the  number 
of  death  surrounding  the  work  house,  there  should  have  been  approximately  100  deaths;  there 
were  only  five.  The  brewery  employed  70  people  had  no  deaths.  As  it  turned  out,  both  areas  had 
other  sources  of  water  on  their  property.  This  is  clear  case  where  temporal  and  spatial  pattern 
recognition  helped  end  a  devastating  epidemic. 


1.2.3  Local  Time 

What  time  is  it?  Asked  this  question,  most  people  would  look  at  their  watch,  a  cell  phone,  or  a 
nearby  clock.  Asked  this  question  before  the  industrial  age,  people  would  answer  by  looking  at 
a  water  clock,  a  sundial,  or  the  sun.  All  of  these  different  techniques  for  learning  the  time  report 
local  time — the  time  that  people  experience. 

For  thousands  of  years,  local-observed  time  was  the  primary  time  reference.  People  rarely  had 
the  need  to  synchronize  time  accurately.  When  they  did,  they  were  able  to  use  bells,  drums,  or 
later,  clock  towers. 
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Time  Zones 

Local  time  is  entirely  dependent  upon  Latitude;  if  one  city  is  3  degrees  to  the  East  of  a  second, 
then  it  will  take  12  minutes  between  the  instant  that  the  Sun  passes  through  the  zenith  of  the 
first  city  and  the  time  that  the  Sun  passes  through  the  zenith  of  the  second.  Still,  this  didn’t 
present  much  of  a  problem  to  humanity  until  the  development  of  bidirectional  instantaneous 
long-distance  communications  (necessitating  two  parties  to  synchronize  their  actions),  and  con¬ 
gested  single-track  long-distance  trains  (necessitating  that  trains  time  their  usage  of  the  single 
track  resource). 

Scottish-born  Canadian  inventor  Sir  Sandford  Flemming  suggested  a  worldwide  system  for 
timezone  in  1878.  He  proposed  24  meridians,  each  15  degrees  or  one  hour  apart  in  longitude, 
starting  from  Greenwich.  The  local  time  for  each  zone  would  be  the  time  of  the  meridian  that 
bisected  it.  On  November  18,  1883,  most  of  the  United  States  and  Canadian  railroads  began 
to  use  this  system,  which  reduced  the  number  of  time  zones  from  56  to  four  we  use  today  [9]. 
Despite  being  adopted  by  the  railroads  in  1883,  the  United  States  did  not  legally  adopt  Standard 
Time  until  the  passage  of  the  Standard  Time  Act  on  March  19,  1918. 

Daylight-Saving  Time  (DST) 

Benjamin  Franklin  is  credited  with  the  invention  of  Daylight-saving  time.  He  discussed  his 
observations  and  ideas  in  an  essay  titled,  “An  Economical  Project.”  He  wrote  this  essay  in 
1784  while  in  Paris  as  an  American  delegate.  The  original  purpose  of  the  idea  was  to  save  on 
the  cost  of  lamp  oil  and  candles  in  Paris  [10J.  Given  that  people’s  day-to-day  activities  were 
pegged  to  the  clock  even  in  the  late  Eighteenth  Century,  Franklin’s  idea  was  to  shift  clocks  back 
an  hour  in  the  fall  so  that  people  would  experience  an  additional  hour  of  daylight  during  the 
afternoon  working  hours  (and  have  an  hour  of  daylight  less  in  the  morning,  when  most  people 
were  asleep).  He  estimated  that  in  a  single  year,  French  shopkeepers  could  save  one  million 
frances  on  candles  alone.  The  United  States  adopted  DST  in  1918  then  repealed  it  after  the  end 
of  World  War  I,  because  it  was  unpopular.  President  Johnson  signed  the  Uniform  Time  Act  of 
1966  making  DST  law.  The  Energy  Policy  Act  of  2005  amended  the  1966  act  and  started  DST 
on  the  second  Sunday  in  March  and  ends  the  first  Sunday  in  November  [1 1J. 

Coordinated  Universal  Time  (UTC) 

UTC  is  the  worldwide  system  for  civil  time.  Atomic  clocks  are  kept  in  labs  around  the  world. 
The  International  Bureau  of  Weights  and  Measures  uses  this  timing  clocks  and  to  determine 
the  international  standard  UTC,  which  is  accurate  to  almost  a  nanosecond  or  one  billionth  of  a 
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second  per  day.  UTC  is  distributed  from  various  radio  stations  and  from  the  Global  Positioning 
System  or  GPS.  U.S.  and  its  territories  timezones  are  set  to  hours  from  UTC,  though  not  all 
countries  follow  UTC  year  round.  The  United  Kingdom  is  one  exception  as  UTC  is  the  local 
time  because  Greenwich  is  located  there  [1 1J. 

With  an  understanding  of  temporal  physiological  and  computational  restrictions  and  different 
ways  time  is  calculated  and  observed,  it  is  obvious  as  to  why  time  and  understanding  temporal 
events  can  be  challenging.  That  is  why  a  simple  system  like  RBI  could  be  so  important  to 
analysts  in  the  field  today.  The  RBI  methodology  is  simple,  effective,  and  has  numerous  DoD 
applications. 

1.3  Application  to  DoD 

Intelligence  agencies  and  military  command  staffs  must  view  massive  amounts  of  data  in  or¬ 
der  to  categorize  and  place  the  data  in  context.  Placing  data  in  context  transforms  it  in  into 
information.  This  transformation  is  a  critical  step  to  making  informed  decisions.  RBI  can  help 
transform  data  into  information. 

There  is  no  panacea  for  intelligence.  The  RBI  methodology  does  not  replace  human  interaction, 
rather,  it  is  required.  It  does  not  solve  all  of  the  collection  or  analysis  requirements;  it  is  not 
intended  to.  However,  this  methodology  shows  significant  promise  as  a  useful  and  effective  tool 
for  analysts,  intelligence  agencies,  and  law  enforcement. 

Below  are  possible  applications  that  directly  support  current  DoD  intelligence  and  analyst’s 
needs. 

Some  include: 

•  Terrorist  Activity:  look  for  failed  terrorist  attempts,  identify  probable  locations  and 
times  of  Improvised  Explosive  Device  (IED)  placements,  attribute  activity  to  certain  or¬ 
ganizations,  classify  different  social  network  activity,  identify  the  planning  phase  of  an  on 
going  terrorist  operation,  sort  large  data  sets  quickly  for  relevant  data,  identify  changes  in 
operational  tempo. 


•  Criminal  Activity:  find  financial  activity,  classify  behavior  of  personnel  in  an  organi- 
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zation  or  the  organization  itself,  identify  non  standard  activity,  determine  social  network 
activity  and  classify  that  activity. 

•  Nation  State  Activity:  classify  specific  organizational  activity,  determine  irregular  activ¬ 
ity,  alert  analysts  to  indications  and  warnings,  identify  relevant  data  in  large  data  sets,  and 
predict  military  movement. 


1.4  Outline  of  this  Thesis 

Chapter  2  gives  an  overview  of  supporting  and  related  work.  Chapter  3  goes  into  the  mathe¬ 
matics  and  theory  behind  the  techniques  and  concepts  used  in  the  experiments  for  this  thesis. 
Chapter  4  describes  the  experiments  conducted,  the  data  sets,  and  any  pre-processing  done. 
Chapter  5  discusses  these  results  and  lists  ideas  for  future  work.  Chapter  6  is  closing  thoughts 
and  conclusions. 
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IMT  +  ID  -  RDT  + 

POSITIVES: 

1.  Systems  are  easily  parallelizable  with  multiple  timelines  because  timelines  can  be 
reduced  to  smallest  common  denominator  of  time 

2.  Easy  to  comprehend  for  humans,  think  of  a  calendar 

3.  Easy  to  model  for  computers 

4.  Good  for  workflow  analysis 

NEGATIVES: 

1.  Anchored  time  is  difficult  to  implement  if  the  time  given  is  not  correct.  Because  of 
this  it  is  easy  to  miss  temporal  correlations.  One  method  to  overcome  this  issue  is  a 
time  window.  Another  method,  the  one  this  paper  explores  is  the  use  of  regression 
analysis  to  solve  this  sliding  window. 

2.  Certain  implementations  can  be  difficult  to  model  and  become  more  complex  when 
time  windows  overlap. 


Figure  1.1 :  Summary  of  Date/Time  Line  Systems 

IMT 

+  ID  +  RDT 

- 

POSITIVES: 

1.  Easy  for  computers  and  humans  to  understand 

2.  Concepts  and  models  are  well  understood.  A  Turning  machine  is  an  example  of  a 
state  space  machine. 

3.  Formality  can  ensure  both  completeness  and  correctness 

NEGATIVES: 

1.  This  approach  it  not  easy  to  implement  when  evaluating  multiple  event  state 
machines  as  the  transition  for  states  are  not  guaranteed  to  be  the  same  length. 

2.  Longer  temporal  patterns  can  be  more  time  consuming  and  results  in  heavy  resource 
or  time  penalties. 

Figure  1.2:  Summary  of  State  Space  and  Formal  Methods  Modeling 
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IMT  +  ID  +  RDT 

POSITIVES: 

1 .  These  approached  are  not  time  dependent  making  implementation  easier  and  faster. 

2.  The  temporal  logic  system  implemented  is  well  understood  and  easy  to  model 
particular  trends. 

NEGATIVES: 

1.  Implemented  need  to  understand  propositional  logic.  This  takes  formal  training. 

2.  Defining  sub-events  within  larger  events  becomes  more  complicated. 

Figure  1 .3:  Summary  of  Relative  Sequential  Chaining 


Figure  1.4:  Section  of  John  Snow’s  Map  Showing  Location  of  the  Water  Pump  Infected  with  Cholera  and  the 
Resulting  Deaths  from  the  epidemic.  From  [8]. 
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CHAPTER  2: 

Prior  and  Related  Work 


Understanding  events  and/or  activities  in  a  temporal  context  is  important  to  many  fields  in 
business,  science,  mathematics,  and  philosophy.  Through  the  years,  several  different  methods 
and  techniques  have  tried  to  capture  a  sense  of  activity  or  detect  significant,  relevant  events 
with  temporal  data.  Understanding  some  of  these  approaches  is  important  to  see  how  current 
research  is  conducted. 

This  chapter  covers  prior  work  in  these  areas. 

2.1  Survey  of  Temporal  Analysis  Research 

There  have  been  decades  of  research  on  the  topic  of  time.  Below  is  a  comprehensive  but  not 
inclusive  review  of  articles  and  applications  of  temporal  research.  These  papers  are  grouped 
into  the  three  views  of  time  as  discussed  in  Chapter  1  (Date/Time  Line  Systems,  State  Space 
and  Formal  Methods  Modeling,  and  Relative  Sequential  Chaining).  Many  different  techniques 
have  been  used  to  research  temporal  data,  but  arguably  all  research  knowingly  or  unknowingly 
use  one  of  these  views  of  time. 


Table  2.1:  Temporal  Analysis  Papers — Date/Time  Line  Sys¬ 
tems 


Title 

Year 

Short  Description 

Logical  Modeling  Of  Temporal  Data  [12J 

1987 

Discusses  fundamentals  of  temporal 
data  issues  and  defines  a  new  type  of 
temporal  model. 

Automated  Temporal  Reasoning  About  Re¬ 
active  Systems  [13] 

1996 

Helps  define  formal  syntax  and  se¬ 
mantics  for  propositional  temporal 
logic. 

Visualization  Of  Spatio-Temporal  Informa¬ 
tion  In  The  Internet  [14J 

2000 

Uses  a  dynamic  temporal  visualiza¬ 
tion  framework  for  placing  objects  in 
time  and  space. 

Discovering  Calendar-Based  Temporal  Asso¬ 
ciation  Rules  [15J 

2001 

Attempts  to  discover  temporal  asso¬ 
ciation  rules  derived  from  calendar 
dates. 

Date/Time  Line  Systems — Continued  on  next  page 
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Title 

Year 

Short  Description 

Work  Rhythms:  Analyzing  Visualizations  Of 
Awareness  Histories  Of  Distributed  Groups 
U6J 

2002 

Develops  visualizations  of  business 
work  flow  and  does  histogram  analy¬ 
sis  of  daily  activity  over  time. 

Rhythm  Modeling,  Visualizations  And  Ap¬ 
plications  [17J 

2003 

Is  a  refinement  of  their  work  done  in 
2002  using  clustering  techniques  and 
different  visualizations. 

Visually  Mining  And  Monitoring  Massive 
Time  Series  [18] 

2004 

Product  description  of  a  developmen¬ 
tal  visualization  and  time  series  tool. 

Mining  And  Visualizing  The  Evolution  Of 
Subgroups  In  Social  Networks  [  19] 

2006 

Recognizes  the  importance  of  tem¬ 
poral  changes  of  online  communities 
and  discusses  ways  to  model  them. 

Learning  recurrent  behaviors  from  heteroge¬ 
neous  multivariate  time-series  [20] 

2007 

Demonstrates  the  utility  of  learning 
meaningful  patterns  in  multidimen¬ 
sional  and  heterogeneous  data  from 
information  automatically  collected 
from  sensors  worn  by  people. 

Exploring  Global  Terrorism  Data:  A  Web- 
Based  Visualization  Of  Temporal  Data  [2 1 J 

2008 

Develops  visualization  techniques  to 
help  analysts  find  interesting  patterns 
in  a  Global  Terrorism  Database. 

Google  News  Timeline  [22J 

2009 

Innovative  way  to  display  news  from 
different  venues  organized  in  a  cus¬ 
tomizable  temporal  view. 

Table  2.2:  Temporal  Analysis  Papers — State  Space  and  For¬ 
mal  Methods  Modeling 


Title 

Year 

Short  Description 

Mining  Sequential  Patterns:  Generalizations 

1996 

Gives  an  organization  algorithm  for 

And  Performance  Improvements  [23  J 

itemsets. 

Discovery  of  Frequent  Episodes  in  Event  Se¬ 

1997 

Develops  a  framework  for  discover¬ 

quences  [24J 

ing  frequent  episodic  data. 

Discovering  Frequent  Event  Patterns  With 

1998 

Discusses  the  ideas  of  an  event  struc¬ 

Multiple  Granularities  In  Time  Sequences 

ture  and  temporal  granularity. 

L25J 

Knowledge-Based  Event  Detection  In  Com¬ 

1999 

Uses  medical  sensor  data  to  find  and 

plex  Time  Series  Data  [26] 

detect  events  in  temporal  data. 

Correlation  Mining  Between  Time  Series 

2008 

Presents  a  new  algorithm  to  correlate 

Stream  And  Event  Stream  [27 J 

temporal  data  and  events. 

State  Space  and  Formal  Methods  Modeling — Continued  on  next  page 
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Title 

Year 

Short  Description 

Temporal  Mining  For  Interactive  Workflow 
Data  Analysis  [28 ] 

2009 

Develops  a  state  space  approach  for 
evaluating  process  control  logs  with 
workflow  graphs. 

Table  2.3:  Temporal  Analysis  Papers — Relative  Sequential 
Chaining 


Title 

Year 

Short  Description 

Mining  Association  Rules  Between  Sets  Of 
Items  In  Large  Databases  [29] 

1993 

Introduces  the  notion  of  itemsets  and 
how  they  can  be  applied  to  determin¬ 
ing  buying  behavior. 

Segmenting  Time  Series:  A  Survey  And 
Novel  Approach  [30J 

1993 

Completes  a  survey  of  three  time 
series  segmentation  algorithms,  slid¬ 
ing  window,  top-down  and  bottom- 
up.  The  author  states  that  a  combi¬ 
nation  of  sliding  window  and  bottom- 
up  yield  drastically  better  results  than 
any  other  combination. 

Wide  Area  Traffic:  The  Failure  Of  Poisson 
Modeling  [31  ] 

1995 

Discusses  assumptions  of  Poisson  re¬ 
gression  and  exceptions  to  those  as¬ 
sumptions  for  network  traffic. 

Discovery  Of  Frequent  Episodes  In  Event  Se¬ 
quences  [24J 

1997 

Presents  a  framework  for  discovering 
frequent  episodes  in  sequential  data. 

A  Framework  For  Knowledge-Based  Tempo¬ 
ral  Abstraction  [32J 

1997 

Describes  a  domain-independent 
knowledge-based  inference  structure. 

Rule  Discovery  From  Time  Series  [33 1 

1998 

Introduces  two  different  problems; 
one,  data  clustering  and  two,  devel¬ 
opment  of  rule  induction  using  these 
clusters. 

Efficient  Time  Series  Matching  By  Wavelets 
[34] 

1999 

Uses  Discrete  Wavelet  Transform 
(DWT)  to  analyze  and  match  time  se¬ 
ries  data. 

Event  Detection  From  Time  Series  Data  [35J 

1999 

Discusses  time  series  data  and  defines 
a  method  to  determine  change  point 
or  event  detection. 

Relative  Sequential  Chaining — Continued  on  next  page 
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Title 

Year 

Short  Description 

Learning  Recurrent  Behaviors  From  Hetero¬ 
geneous  Multivariate  Time-Series  [20] 

2007 

Develops  a  supervised  model  that  cre¬ 
ates  an  unsupervised  learning  algo¬ 
rithm  of  temporal  activity  for  people 
in  their  homes.  Tuning  the  unsuper¬ 
vised  learning  portion  turned  out  to  be 
difficult  and  severally  effected  system 
performed. 

Data  mining  with  Temporal  Abstractions: 
learning  rules  from  time  series  [36J 

2007 

Users  develop  formal  temporal  pat¬ 
terns  using  Allen’s  temporal  opera¬ 
tors.  Then  their  algorithm  identifies 
events  based  on  these  formal  patterns. 

Unsupervised  Pattern  Mining  From  Symbolic 
Temporal  Data  [37] 

2007 

Builds  a  framework  to  view  tempo¬ 
ral  concepts  and  differing  data  models 
for  data  mining  using  unsupervised 
learning  methods. 

Discovery  Of  Activity  Patterns  Using  Topic 
Models  [38] 

2008 

Uses  modern  Natural  Language  Pro¬ 
cessing  (NLP)  techniques  to  deter¬ 
mine  activity  patterns. 

Spatial- Temporal  Causal  Modeling  For  Cli¬ 
mate  Change  Attribution  [39] 

2009 

Develops  a  spatial-temporal  regres¬ 
sion  model  based  on  a  Graphical 
Granger  Model. 

Spatial- Temporal  Association  Between  Fine 
Particulate  Matter  and  Daily  Mortality  [40] 

2009 

The  authors  investigates  the  spatial- 
temporal  nature  of  pollution  and  mor¬ 
tality  using  a  Bayesian  framework. 

2.1.1  Date/Time  Line  Systems  Examples 

Combining  the  concepts  of  space  and  time  is  another  way  to  look  at  data  and  discover  relations. 
This  method  is  important  in  information  discovery  of  objects  or  events  that  have  temporal  and 
spatial  relations.  (Figure  2.1)  shows  different  ways  to  look  at  the  time:  as  single  event,  two 
moments  in  time,  interval  (passed  or  current),  and  how  they  apply  to  a  space  or  location. 

“Rhythm  modeling,  visualizations  and  applications”  builds  on  previous  work  in  rhythm  detec¬ 
tion  and  describes  algorithms  to  detect  and  model  temporal  patterns  from  online  geolocation 
data.  The  tools  the  authors  built  generate  visualizations  for  users  to  see  their  workflow  pro¬ 
cesses.  The  tools  use  heuristics  to  determine  the  threshold  values,  then  cluster  work  events 
by  minimizing  Euclidean  distance.  Probability  distributions  are  recalculated  and  the  process  is 
repeated  until  the  initial  and  refined  estimates  converge.  The  paper  then  discusses  several  visual¬ 
izations  created  by  the  program  and  evaluates  them  and  proposes  different  possible  applications 
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Figure  2.1 :  Framework  for  Dealing  with  Events  that  have  Both  a  Temporal  and  Spatial  Relation.  From  [14]. 


for  these  tools  [17]. 

On  April  20,  2009,  Google  announced  Google  News  Timeline.  It  organizes  news  search  results 
in  a  zoomable,  graphical  timeline  Figure  2.2.  This  webtool  is  another  example  of  Date/Time 
Line  Systems.  The  view  is  an  anchored  scaleable  calendar  view  with  a  selectable  temporal 
granularity  of  day,  week,  month,  year,  and  decade  [22] . 


2.1.2  State  Space  and  Formal  Methods  Modeling  Example 

“In  Temporal  Mining  for  Interactive  Workflow,”  Berlingerio,  Pinelli,  Nanni,  and  Giannotti  build 
upon  work  done  in  workflow  mining.  Their  approach  reads  computer  log  data  and  builds  tem¬ 
poral  process  models  by  grouping  sets  of  execution  statements  with  similar  execution  times, 
grouping  semantics  of  executions,  and  interfacing  with  domain  experts  to  select  the  appropriate 
models.  This  method  can  help  identify  abnormalities  in  the  logs  and  can  in  some  cases,  generate 
new  process  models  [28] . 


2.1.3  Relative  Sequential  Chaining  Examples 

In  the  early  1990s,  regression  was  used  to  model  network  traffic  analysis.  It  was  accepted  that  all 
network  traffic  arrival  rates  could  be  modeled  using  regression  analysis.  In  their  paper  “Wide 
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Figure  2.2:  Screen  Capture  from  Google’s  News  Timeline  Tool 


Area  Traffic:  The  Failure  of  Poisson  Modeling”,  Paxson  and  Floyd  discuss  several  statistical 
methods  used  to  model  computer  network  traffic.  They  found  that  Poisson  Linear  Regression 
does  not  adequately  model  all  forms  of  network  traffic.  They  state  Poisson  distributions  are 
only  valid  for  modeling  the  arrival  of  user  sessions  and  that  the  protocols  are  too  “bursty”  and 
therefore  have  different  time  scales  which  prevent  the  model  from  performing  well.  Again, 
temporal  granularity  is  an  issue  and  must  be  understood  for  the  Poisson  distribution  to  work 
well. 

Das,  Lin  and  Mannila  developed  a  method  to  create  and  evaluate  rules  for  stock  market  predic¬ 
tion.  They  were  able  to  generate  rules  based  on  exploratory  induction  of  discrete  time  series 
data.  The  first  step  in  their  process  was  to  use  K-means  clustering  to  classify  stock  data.  They 
then  developed  an  algorithm  to  discover  simple  rules  from  these  different  sequences.  Their 
method  created  a  large  range  of  rules,  some  with  limited  value.  To  compensate  for  the  large 
number  of  rules,  they  used  the  J-measure  for  rule-ranking  developed  by  Smyth  and  Goodman 
in  1991.  They  found  their  technique  needed  the  help  of  human  interpretation  to  find  the  most 
useful  rules  for  the  particular  dataset  [33 J. 
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Guralnik  and  Srivastava  developed  a  data  mining  method  to  separate  temporal  data  into  events 
when  the  model  changes  overtime.  This  technique  is  often  called  change-point  detection.  Their 
method  requires  the  desired  number  of  change  points  to  be  given,  which  can  be  a  drawback. 
The  take-away  of  this  paper  is  that  incremental  optimization  is  not  nearly  as  effective  as  global 
optimization  over  the  whole  set  of  data  [35  ]. 

Another  paper  dealing  with  change  point  detection  combines  two  standard  approaches,  mainly, 
finding  the  change  points  given  a  desired  number  of  change  points  and  uses  a  best  fit  curve  to 
determine  the  interval  between  successive  change  points.  The  authors  studied  detecting  change 
points  by  using  Maximum  Likelihood  Estimation  (MLE).  If  the  number  of  change  points  are 
known  before  hand,  then  the  statistical  likelihood,  L,  of  the  change  point  is  equal  to 


L  = 


nli  *r 
Ell  rmo* 


(2.1) 


where  k  is  the  number  of  change  points,  in,  is  the  number  of  time  points  in  segment  i,  and  n  is 
the  total  number  of  points. 

If  the  change  points  are  not  known,  the  maximum  likelihood  estimate  of  the  9,’s  can  be  found 
by  maximizing  the  likelihood  l  over  all  possible  sets  of  9i  s,  or  equivalently,  by  minimizing 
—2  log  L  the  function  is  equivalent  to, 


—  2  log  L 


Eti  mi  loS 

"  log(Eti  "bE) 


(2.2) 


Shahar  developed  a  general  framework  for  reusing  domain-independent  knowledge  for  solving 
temporal  abstraction  and  enabled  sharing  of  domain-specific  knowledge  with  other  tasks  in  the 
same  domain.  This  framework  has  been  used  in  several  different  areas  of  medical  research 
and  has  proven  useful  in  the  organization  of  his  temporal  work.  Specifically,  he  defines  five 
knowledge-based  temporal-abstraction  methods:  temporal-context  restriction,  vertical  temporal 
inference,  horizontal  inference,  temporal  interpolation,  and  temporal  pattern  matching  [32J. 

Huynh,  Fritz,  and  Schiele  use  Natural  Language  Processing  (NLP)  machine  learning  methods 
to  automatically  annotate  users’  daily  activity.  Subjects  wore  two  tracking  devices  for  several 
days.  The  output  from  the  device  was  converted  into  documents  of  discrete  activity  labels. 
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Using  Latent  Dirichlet  Allocation,  the  documents  were  associated  with  these  activity  labels. 
The  authors  then  showed  how  labeling  of  events  could  be  done  with  unsupervised  learning, 
though  supervised  learning  yielded  the  best  results.  This  technique  has  potential  to  prove  more 
useful  and  robust  then  other  unsupervised  learning  algorithms  because  many  of  the  techniques 
used  in  NLP  are  understood  [38]. 
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CHAPTER  3: 
Techniques 


This  chapter  documents  the  techniques,  concepts,  and  technical  approaches  used  in  the  experi¬ 
ments  for  this  thesis.  This  chapter  will  cover  certain  fundamental  concepts  and  terms  necessary 
for  basic  understanding  of  this  research. 

3.1  Generalized  Linear  Models  and  Logistic  Regression 

Generalized  linear  models  (GLM)  are  a  set  of  models  that  approximate  more  complex  phe¬ 
nomenon.  Linear  models  are  an  important  class  of  probabilistic  model.  In  the  1950’s,  logistic 
regression  became  an  important  tool  in  biostatistics  and,  today,  is  used  in  many  areas  of  science, 
engineering,  business,  and  economics  [41 J. 

3.1.1  Poisson  Linear  Models 

Within  the  GLM  there  are  special  sets  of  logistic  regression  models  for  univariate  response  data. 
A  Poisson  linear  regression  model  works  best  with  independent  count  data  such  as  the  number 
of  calls  to  a  call  center  [41  J.  In  a  Poisson  Linear  Model,  the  variance  is  a  function  of  the  mean. 

Terms  and  Assumption 

Terms  and  assumption  in  Poisson  Linear  Regression: 


Co  variates  or  Regressor  Variables  (xi ,  x2,  •  •  •  ,  Xk)  are  the  items  one  wishes  to  test  against. 
For  example,  if  one  wanted  to  know  the  effect  of  certain  drugs  based  on  age,  sex,  dose, 
the  covariates  would  be  age,  sex,  dose. 

Regression  Coefficients  (/ 3 )  are  the  unknown  model  parameters  that  are  calculated  using  the 
Poisson  LM. 

Response  Variable  (y)  item  of  interest  or  collected  data.  This  count  could  be  the  number  of 
calls  per  day  to  a  call  center  or  the  number  of  IED  attacks  in  a  given  area  per  week. 

y  =  X/3  (3.1) 
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Z  Value  -  In  the  case  of  this  test  data  the  higher  Z  value  the  better.  As  shown  in  Chapter 
4  the  Z  values  are  used  to  rank  the  individual  rules  for  the  reasons  mentioned  above. 
Additionally,  the  Z  values  are  in  absolute  terms  because  it  is  a  logistic  regression  model. 


P- value  Pr(>|Z|)  -  The  probability  that  the  rule  added  appears  useful,  in  the  case  where  it  is 
not.  Therefore  a  P  value  close  to  zero  indicates  a  good  predictor.  Data  derived  from 
Chapter  4  show  extremely  small  P  values.  These  values  quickly  rounded  to  zero  as  the 
Z  value  grows  as  shown  in  table.  This  means  P  values  cannot  be  used  to  prioritize  the 
rules. 


Z  value 

P  value 

0.5 

0.6170 

1 

0.3173 

2 

0.0455 

3 

0.0027 

4 

6.3e-05 

5 

5.7e-07 

6 

1.9e-09 

7 

2.5e-12 

8 

1.2e-15 

Mean  (/i)  The  mean  number  of  occurrences  or  arithmetic  mean. 

1  n 

/i(x)  =  y  Xi,  i  =  1, 2,  •  •  s  n  (3.2) 

n 

i= 1 

Maximum  Likelihood  Estimator  (MLE)  provides  a  estimate  for  how  well  a  model  fits  the 
data. 

Independence  -  We  say  that  two  random  variables  are  independent  when  of  two  events  A  and 
B  such  that  P(A  fl  B)  =  P(A)P(iB). 

The  assumptions  for  Poisson  Regression  are: 


1.  Observations  are  independent. 
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2.  Variance  and  mean  are  equal: 


E(xi)  =  /x(xj),i  =  1,2,  >  •  •  ,n  (3.3) 

3.  A  set  of  regressors  xi,  x2,  •  •  •  ,  xjt  influence  /i  via  the  model. 

/L  =  =  1,2,  •  •  •  ,n  (3.4) 


With  these  assumptions  in  place  we  can  set: 


where  y 


V  i 
1/2 


J/n 


Xu 

Xu  ■  ' 

%lk 

X2\ 

X22  ■  1 

’  ‘  %2k 

%nl 

%n2 

%nk 

A) 


/3fc 


and  E(e)  --  0 


In  order  to  find  the  Maximum  Likelihood  Estimator  (MLE),  we  start  with 


L  =  In  £(y,/3)  =  ^[-ex^  +  y,;x'/3  -  lnj/J]  (3.5) 

i=l 

Using  this  equation,  the  MLE  is  derived  using  an  unsigned  numerical  search  procedure  like 
iteratively  reweighed  least  squares  [42J. 
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3.2  Program  Design 

Temporal  rules  and  data  manipulation  were  programmed  in  Python  version  2.6,  while  statistical 
analysis  was  done  in  R  version  2.10.0  using  the  Rpy  version  2.0.8  interface  to  send  commands 
to  and  retrieve  data  from  R. 

The  Python  temporal  rules  were  created  as  class  objects  to  allow  for  logical  design  of  rules 
and  polymorphic  rule  creation.  In  many  cases,  rules  were  combined  to  make  other  rules.  For 
example,  US  Workweek  is  a  combination  of  the  common  work  days  Monday  thru  Friday  and 
the  removal  of  national  holidays. 

The  temporal  assumptions  in  this  thesis  are  as  follows.  One,  all  temporal  rules  relate  to  daily 
activity.  Two,  both  the  scope  of  the  rules  and  the  data  have  the  same  temporal  granularity, 
meaning  if  the  data  are  counts  in  days,  then  the  rules  are  applicable  to  days. 

3.2.1  Formatting  Data 

In  some  cases,  the  data  had  a  finer  temporal  granularity  than  days  so  the  information  was  trans¬ 
formed  to  meet  a  daily  format.  For  example,  one  dataset  had  a  temporal  granularity  of  seconds 
and  was  therefore  transformed  into  a  daily  number  of  calls.  This  transformation  ensured  the 
rules  and  dataset  had  the  same  temporal  measure  (i.e.,  days).  If  the  dataset  where  not  set  to  a 
daily  count,  different  temporal  rules  would  have  had  to  have  been  created  so  that  these  rules 
matched  the  granularity  of  the  data.  Since  doing  this  would  not  have  added  to  the  experimental 
value  of  the  thesis,  the  Israeli  data  was  made  to  have  the  same  temporal  granularity  as  the  ACTS 
data  (i.e.,  day).  That  said,  the  rules  could  have  been  created  to  match  any  temporal  granularity 
desired. 

Data  formatting  was  done  using  simple  Python  expressions.  Modules  were  created  for  each 
dataset  to  make  sure  it  configured  to  the  correct  temporal  dimension  (i.e.,  day).  Additionally, 
only  the  first  instance  of  a  unique  rule  was  used.  In  the  example  below  Muslim  and  Jewish 
Workweeks  are  the  same.  Because  MuslimWorkweek  rule  was  analyzed  first  the  JewishWork- 
week  was  dropped  from  consideration.  This  will  be  discussed  in  Chapter  4  in  greater  detail. 

3.2.2  Developing  and  Verifying  Rules 

Rules  were  developed  using  a  simple  hypothesis,  cultural  norms  and  work  patterns  are  predic¬ 
tive  of  human  behavior.  Therefore,  rules  can  be  developed  and  used  to  correlate  their  behavior 
such  as  daylight  saving  activity,  or  calls  to  a  call  center. 


24 


DST  change  dates  where  generated  from  the  computer’s  own  time  zone  algorithm.  The  com¬ 
puter  actually  calculates  over  500  different  zones  around  the  world.  Many  of  these  zones  have 
the  same  DST  transitions.  For  example,  Atlantic. Bermuda  and  America.Denver  and  Amer¬ 
ica. New. York  all  have  the  exact  same  DST  transitions.  Because  of  this  duplication  only  the  first 
unique  rule  was  tested  while  all  other  similar  rules  were  dropped  from  the  rule  list.2 

Some  of  the  rules  used  in  these  experiments  were  developed  from  workday  patterns  (standard 
workweek  minus  holidays),  religious  regional  holidays  from  the  Jewish  or  Muslim.  The  West¬ 
ern  workweek  starts  on  Monday  and  ends  on  Friday,  while  the  Israeli  and  Muslim  workweeks 
start  on  Sunday  and  end  on  Thursday.  Holidays  are  unique  to  a  country  and/or  a  geographical 
region.  An  example  of  a  unique  holiday  would  be  Independence  Day  (July  4)  in  the  United 
States.  Of  course,  there  are  other  holidays  that  multipule  cultures  such  as  New  Years.  However, 
taken  as  a  whole,  the  cultural  work  year  is  unique  to  every  country  and  therefore  can  be  used 
to  correlate  data  to  location.  Even  within  predominately  Muslim  countries,  where  holidays  are 
very  closely  tied  to  the  religious  holiday,  there  are  differences  which  allow  for  differentiation. 

Again  rules  chosen  were  based  on  their  tight  binding  to  a  specific  cultural  behavior  like  observed 
holidays  and  DST.  However,  these  rules  could  have  as  easily  have  been  known  patterns  of 
a  military  unit’s  operational  tempo,  spending  patterns  of  certain  demographics,  or  timing  of 
terrorist  attacks. 

3.3  Implementation 

The  actual  rules  are  implemented  as  instances  of  classes  that  subclass  the  TimeRule  abstract 
super  class  in  the  Python  file  t  ime_rules  .  py. 

The  abstract  superclass  defines  a  simple  abstract  rule  that  matches  no  time  events: 


class  TimeRule: 

DESCRIPTION  =  "Abstract  rule  Class." 
def  inRule (self ,  tval)  : 

"""The  default  inRule  is  that  it  is  never  in  the  rule, 
tval  is  in  local  time.""" 
return  False 
def  _ str _ (self)  : 

return  "%s"  %  (self. _ class _ . _ name _ ) 

2As  it  turned  out,  this  pre-filtering  step  was  not  necessary  because  R  could  do  stepwise  regression  which  would 
have  had  the  same  effect,  by  only  adding  rules  that  change  the  models  accuracy.  Because  adding  the  same  rule 
would  not  change  the  models  accuracy  it  would  not  have  been  added  to  the  list. 
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Subclasses  of  this  abstract  class  implement  specific  rules.  For  example,  this  rule  implements 
the  Israeli  work  days: 


class  Israeli_WorkDays (TimeRule)  : 

"""Return  "True"  if  it  is  a  day  on  which  somebody  works  in  Israel.""" 
def  _ init _ (self)  : 

self . Israel_Holiday_Rule  =  Jewish_Holidays_1999 
self. WEEKDAY_RULE  =  IsraeliWeekday ( ) 
def  inRule (self , tval) : 

#  A  day  is  an  israel  workday  if  it  is  workday  and  not  a  holiday 

return  self . WEEKDAY_RULE . inRule (tval)  and  self . Israel_Holiday_Rule . inRule 


The  Python  list  rule_list  holds  every  rule  that  is  to  be  tested.  Rules  are  added  as  objects, 
which  are  typically  instances  of  a  rule  class.  This  allows  a  single  generic  class  to  generate  many 
specialized  rules,  although  some  rule  classesa  re  used  to  generate  just  a  single  instance.  Adding 
instances  to  this  list  is  straightforward: 

rule_list . append ( Israeli_WorkDays ( ) ) 

A  special  Almanac  class  allows  the  creation  of  rules  (instances)  that  match  a  specific  day  of 
the  year.  For  example,  this  code  creates  a  rule  that  matches  the  actual  US  Independence  day  in 
1999  and  adds  it  to  the  rule_list: 

rule_list . append (Almanac ( "US  Independence  Day  1 999" , [date ( 1 999, 7 , 4 ) ] , True) ) 


With  this  in  place,  multiple  rules  can  be  combined  to  make  a  larger  rule.  For  example,  US_- 
WorkDays  is  a  combination  of  the  WesternWeekday  rule  and  the  days  that  are  not  US.  Holidays. 


class  US_WorkDays (TimeRule) : 
def  _ init _ (self)  : 

self . US_HOLIDAY_LIST  =  [US_NewYears ( ) , US_MartinLuther ( )  , 
US_WashingtonBDay ( ) , US_MemorialDay ( ) , US_IndependanceDayObserved ( ) , 
US_LaborDay ( ) , US_ColumbusDay ( ) , US_VeteriansDayObserved ( ) , 
US_Thanksgiving ( ) , US_ChristmasDayObserved ( ) ] 
self. WEEKDAY_RULE  =  WesternWeekday ( ) 
def  inRule (self ,  tval)  : 

"""  It  is  not  a  workday  if  it  is  a  US  holiday 
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Go  through  the  list  of  US  holidays  and  see  if  any  of  the  rules  match. 

If  so,  return  false""" 

for  r  in  self ,US_HOLIDAY_LIST : 

if  r . inRule (tval ) ==True :  return  False 
#  It  is  a  Workday  if  the  day  is  a  weekday 
return  self . WEEKDAY_RULE . inRule (tval) 
rule_list . append (US_WorkDays ( ) ) 


The  following  rules  were  used: 

Monday 

Tuesday 

Wednesday 

Thursday 

Friday 

Saturday 

Sunday 

January 

February 

March 

April 

May 

June 

July 

August 

September 

October 

November 

December 

Western  Weekday 

Muslim  Weekday 

IsraeliWeekday 

US  _New  Years 

US  _MartinLuther 

US  WashingtonBDay 

US  MemorialDav 

US  JndependanceDayObserved 

USXaborDay 

US  _ColumbusDay 

US  JVeteriansDayObserved 

US  Thanksgiving 


US_ChristmasDayObserved 

US_WorkDays 

US  BimonthlyPay 

Israeli  Work  Days 

Muslim_WorkDays 

DSTRule(  Africa/ Abidjan) 

DSTRule(Africa/Addis_Ababa) 

DSTRule(  Africa/Algiers) 
DSTRule(Africa/Blantyre) 

DSTRule(  Africa/Cairo) 

DSTRule(  Africa/Ceuta) 

DSTRulef  Africa/Tunis) 

DSTRulef  Africa/Windhoek) 
DSTRule(America/Adak) 

DSTRule(  America/Anchorage) 

DSTRule(  America/Anguilla) 

DSTRule(  America/ Araguaina) 

DSTRule(America/Argentina/Buenos_Aires) 

DSTRule(America/Argentina/Catamarca) 

DSTRule(  America/ Argentina/SanXuis) 

DSTRule(America/Asuncion) 

DSTRule(America/Atikokan) 

DSTRule(  America/Belem) 

DSTRule(  America/Belize) 
DSTRule(America/Boa_Vista) 

DSTRule(  America/Boise) 

DSTRule(America/Campo_Grande) 

DSTRule(America/Cancun) 

DSTRule(  America/Caracas) 

DSTRule(  America/Dawson) 
DSTRule(America/Dawson_Creek) 
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DS  TRule(  Americ  a/Detroit) 

DSTRule(  America/Fortaleza) 

DSTRule(  America/Glace  JBay) 

DS  TRule(  America/Godthab) 
DSTRule(America/Goose_Bay) 

DS  TRule(  Americ  a/Havana) 

DSTRule(  America/Indian  a/Indianapolis) 
DSTRule(  America/Indian  a/Knox) 

DS  TRule(  Americ  a/Miquelon) 

DS  TRule(  Americ  a/Montevideo) 
DSTRule(America/Noronha) 

DS  TRule(  Americ  a/Recife) 

DSTRule(  America/Resolute) 

DSTRule(  America/Santiago) 

DS TRule( America/S ao  Paulo) 
DSTRule(America/Scoresbysund) 
DSTRule(  America/St  Johns) 

DS  TRule(  Antarctic  a/Casey) 

DS  TRule(  Antarctic  a/Davis) 

DS  TRule(  Antarctic  a/DumontDUrville) 
DS  TRule(  Antarctic  a/Mawson) 

DS  TRule(  Antarctic  a/McMurdo) 
DSTRule(  Asia/ Almaty) 

DSTRulet  Asia/ Amman) 

DSTRule(  Asia/ Anadyr) 

DS  TRule(  Asia/Aqtau) 

DS  TRule(  Asia/Ashgabat) 

DSTRule(  Asia/Baghdad) 

DSTRule(  Asia/Baku) 

DS  TRule(  Asia/Beirut) 

DSTRule(  Asia/Colombo) 

DSTRule(  Asia/Damascus) 

DS  TRule(  Asia/Dubai) 

DS  TRule(  Asia/Gaza) 

DS'IRulct  Asia/Irkutsk) 

DS  TRule(  Asia/J  ay  apura) 

DSTRule(  Asia/Jerusalem) 

DSTRule(  Asia/Kabul) 


DSTRule(  Asia/Kathmandu) 
DSTRule(  Asia/Krasnoyarsk) 
DSTRulet  Asia/Magadan) 
DSTRule(  Asia/Nicosia) 

DSTRule(  Asia/Novosibirsk) 
DSTRule)  Asia/Rangoon) 
DSTRule)  Asia/Sakhalin) 
DSTRule)  Asia/Tbilisi) 

DSTRule)  Asia/Tehran) 

DSTRule(  Asia/Yakutsk) 
DSTRule(  Asia/Yekaterinburg) 
DSTRule(  Atlantic/Can  ary) 
DSTRule(Atlantic/Cape_Verde) 

D  S  TRule(  Atl  antic/S  outh  _Georgi  a) 
DSTRulet  Atl  antic/Stanley) 
DSTRule(  Australia/ Adelaide) 
DSTRulel  Australia/Currie) 
DSTRule)  Australia/Darwin) 
DSTRule(Australia/Eucla) 

D  S  T  R  u  I  e  ( A  u  s  t ra  I  i  a/Lo  rd  Howe) 

DSTRule(Europe/Moscow) 

DSTRule(Europe/Riga) 

DSTRule(Pacific/Apia) 

DSTRule(Pacific/Chatham) 

DSTRule(Pacific/Easter) 

D  S  TRule(Pacific/Efate) 

DSTRule(Pacific/Enderbury) 

DSTRule(Pacific/Fakaofo) 

DSTRule(Pacific/Fiji) 

DSTRule(Pacific/Funafuti) 

DSTRule(Pacific/Gambier) 

DSTRule(Pacific/Kiritimati) 

DSTRule(Pacific/Marquesas) 

DSTRule(Pacific/Midway) 

DSTRule(Pacific/Noifolk) 

DSTRule(Pacific/Pitcairn) 

DSTRule(Pacific/Tongatapu) 


Programming  rule  details  can  be  found  in  the  Appendix. 
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CHAPTER  4: 
Experiments 


This  section  details  the  dataset  and  results  of  the  experiments.  There  were  three  dataset  used 
the  ACT  Data,  Israeli  Bank  Data,  and  the  GTD  data.  All  of  the  datasets  overlapped  in  1999. 
Because  of  this  overlap,  only  one  temporal  set  of  rules  was  needed. 

In  conducting  the  experiments,  the  Poisson  regression  was  run  against  each  rule  independently 
ensuring  the  rules  did  not  over-fit  the  data.  While  not  shown  in  the  data  tables  below,  the 
Poisson  regressions  were  also  accomplished  using  a  stepwise  function,  meaning  rules  are  added 
and  removed  from  the  list  based  on  significance  of  modelling  effect.  Because  duplicate  rules 
(like  countries  observing  the  same  DST  changes,  were  removed  before  making  the  stepwise 
regression,  the  results  where  not  significantly  different.  However,  if  pre-filtering  were  not  done, 
stepwise  regression  would  have  automatically  removed  duplicate  rules.  In  the  future  this  might 
be  a  better  method  because  it  eliminates  a  preprocessing/filtering  step. 

Again,  as  discussed  in  Chapter  3,  it  is  difficult  to  show  a  substantial  difference  in  the  P  value 
when  the  Z  values  are  so  high;  therefore,  Z  values  not  P  values  were  used  to  prioritize  the 
individual  rules.  In  ACTS  and  Israeli  Bank  datasets  rules  were  created  with  the  data  in  mind  and 
have  very  high  Z  scores.  Because  no  rule  were  created  for  the  GTD  dataset,  these  prioritizations 
have  lower  Z  scores  and  are  less  meaningful. 

Upon  analysis,  the  findings  support  the  RBI  methodology.  A  rule  bank  of  known  temporal  data 
can  be  used  to  correlate  temporal  activity  to  different  data. 


4.1  NIST  Data 

This  dataset  was  collected  by  the  National  Institute  of  Standards  and  Technology  (NIST).  It  was 
collected  from  the  Automated  Computer  Time  Service  (ACTS),  which  distributes  Coordinated 
Universal  Time  (UTC)  to  computer  systems  via  analog  modems  over  ordinary  telephone  lines, 
operating  mainly  from  Boulder,  Colorado.  Figure  4.1  shows  ACTS  timing  requests  for  1999. 
These  data  were  taken  from  a  dataset  that  had  10  years  worth  of  ACTS  data  [3J. 
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NIST  ACTS  Data 

35000 


Day  of  Year 


Figure  4.1 :  ACTS  Data — Number  of  Phone  Calls  Per  Day. 


4.1.1  Results 

As  seen  in  Summary  of  Findings  table  from  the  ACTS  Data,  DSTRule.America.Adak  was  the 
most  strongly  correlated  rule  with  the  dates  for  this  rule  being  4/4/99  and  10/31/99.  Adak  is 
a  city  in  Alaska  so  this  rule  represents  the  DST  date  changes  for  most  of  the  United  States. 
Specifically  this  includes  Denver,  Colorado  where  the  ACTS  are  located.  As  seen  in  the  figure 
4.1 ,  the  two  highest  data  spikes  are  on  4/4/99  (23192  calls  that  day)  and  10/31/99  (30024  calls 
that  day).  The  Z  value  for  this  rule  was  219.56,  which  means  there  is  a  strong  correlation 
between  this  rule  and  where  the  data  originated.  The  next  closest  rule  October  has  a  Z  value 
of  177.29,  this  make  sense  as  the  second  DST  change  is  in  October.  The  third  rule  is  February 
with  a  value  of  153.2.  At  first  look  this  makes  no  sense,  however,  the  rules  are  ranked  by  their 
absolute  Z  value.  The  estimate  for  this  rule  is  negative,  which  means  the  rule  demonstrates 
negative  associativity.  In  other  words  February  is  the  third  on  the  list,  but  in  reality  it  is  strongly 
anticorrelated  to  the  data.  Again  as  discussed  in  Chapter  3,  Z  values  of  more  than  40  have  a  P 
value  which  truncates  to  zero  because  64  bits  is  not  enough  precision  for  values  so  small. 

In  this  case  the  RBI  methodology  works  and  shows  a  significant  correlation  to  DST  in  the  United 
States  for  this  dataset.  It  is  interesting  to  note  that  DSTRule. Africa.  Ceuta  and  DSTRuIe.America.Havana 
are  only  one  day  from  each  other.  DSTRuIe.America.Havana  has  a  Z  value  of  130.5. 
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Table  4. 1 :  Summary  of  Findings — ACTS  Data — Every  Rule 
Independent 


Rule  Name 

Estimate 

Std.  Error 

ABS(Z  Value) 

Pr(>  z ) 

DSTRule.  America.  Adak. 

0.9586 

0.0044 

219.56 

0.0000 

October 

0.2917 

0.0016 

177.29 

0.0000 

February 

-0.345 

0.0023 

153.2 

0.0000 

DSTRule.  Africa. Ceuta. 

0.6894 

0.005 

138.48 

0.0000 

DSTRule.  America.Havana. 

0.6596 

0.0051 

130.57 

0.0000 

DSTRule.  Africa.  Windhoek. 

0.5353 

0.0054 

99.7 

0.0000 

Musi  imAVork  Days 

0.111 

0.0012 

96.14 

0.0000 

MuslimWeekday 

0.1056 

0.0012 

90.49 

0.0000 

Israeli  WorkDays 

0.1012 

0.0011 

89.11 

0.0000 

DSTRule.Africa.  Cairo. 

-0.7301 

0.01 

72.73 

0.0000 

DSTRule.  Asia.  Amman. 

-0.6142 

0.0095 

64.81 

0.0000 

DSTRule.  America.  Araguaina. 

-0.6003 

0.0094 

63.78 

0.0000 

Saturday 

-0.0963 

0.0015 

62.99 

0.0000 

Monday 

0.0851 

0.0014 

59.4 

0.0000 

September 

0.105 

0.0018 

58.38 

0.0000 

Friday 

-0.0811 

0.0015 

53.79 

0.0000 

August 

0.0862 

0.0018 

48.27 

0.0000 

Tuesday 

0.0648 

0.0014 

44.92 

0.0000 

Sunday 

0.0641 

0.0014 

44.41 

0.0000 

July 

0.0745 

0.0018 

41.49 

0.0000 

DSTRule.Pacific.Fiji. 

-0.3351 

0.0083 

40.62 

0.0000 

U  S  C  h  r  i  s  t  m  a  s  D  ay  Observed 

0.2946 

0.0085 

34.56 

0.0000 

April 

0.0624 

0.0018 

34.08 

0.0000 

DSTRule.America.  Goose  _Bay. 

0.2086 

0.0063 

33.08 

0.0000 

US  JVeteriansDay  Observed 

0.2667 

0.0086 

30.85 

0.0000 

U S  Memorial  Day 

-0.1363 

0.0047 

28.74 

0.0000 

US  .WorkDays 

0.0319 

0.0011 

28.72 

0.0000 

DSTRule.  America. Godthab. 

0.1409 

0.0065 

21.62 

0.0000 

DSTRule.  Asia.Tehran. 

0.137 

0.0065 

20.97 

0.0000 

ACTS  Data  -  Continued  on  next  page 
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Table  4.1 — Summary  of  Findings — ACTS  Data — Continued 


Rule  Name 

Estimate 

Std.  Error 

ABS(Z  Value) 

Pr(>  z ) 

US -Thanksgiving 

0.181 

0.009 

20.06 

0.0000 

US_ColumbusDay 

0.168 

0.0091 

18.5 

0.0000 

DSTRule.  Asia  Jerusalem. 

0.1118 

0.0066 

16.9 

0.0000 

Wednesday 

-0.0244 

0.0015 

16.4 

0.0000 

US_LaborDay 

0.1463 

0.0092 

15.94 

0.0000 

DSTRule.  Asia.Baghdad. 

0.1019 

0.0066 

15.34 

0.0000 

Thursday 

-0.0225 

0.0015 

15.12 

0.0000 

DSTRule.  Australia.Currie. 

0.0998 

0.0067 

15.01 

0.0000 

We  stern  Weekday 

0.0165 

0.0011 

14.4 

0.0000 

DSTRule.  Antarctica.McMurdo. 

0.0954 

0.0067 

14.3 

0.0000 

DSTRule.Pacific  .Tongatapu . 

0.1262 

0.0093 

13.61 

0.0000 

DSTRule.America.  Santiago. 

0.0833 

0.0067 

12.42 

0.0000 

DSTRule.Asia.Gaza. 

0.0805 

0.0067 

11.98 

0.0000 

U  S  -WashingtonBDay 

0.1049 

0.0094 

11.19 

0.0000 

DSTRule.  America.  Argentina  .B  uenos 

.-  0.0956 

0.0094 

10.16 

0.0000 

Aires. 

DSTRule.  America.  Asuncion. 

0.0283 

0.0069 

4.1 

0.0000 

U  S  JndependanceDay  Observed 

0.0299 

0.0097 

3.08 

0.0021 

March 

-0.0052 

0.0019 

2.8 

0.0051 

June 

-0.0053 

0.0019 

2.8 

0.0051 

DSTRule.  Atlantic. Stanley. 

0.0182 

0.0069 

2.63 

0.0086 

May 

0.0044 

0.0018 

2.38 

0.0171 

January 

-19.6305 

18.8083 

1.04 

0.2966 

US  _B  imonthly  Pay 

-16.5472 

16.5224 

1 

0.3166 

DSTRule.Asia.Damascus. 

-16.5472 

16.5224 

1 

0.3166 

US  JNew  Years 

-16.5445 

23.3662 

0.71 

0.4789 

U  S  -MartinLuther 

-16.5445 

23.3662 

0.71 

0.4789 
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4.2  Israel  Bank  Center  Call  Data 

This  data  was  downloaded  from  http://iew3.technion.ac.il/serveng/callcenterdata.  The  data  is 
an  archive  of  all  the  calls  handled  by  a  bank  call  center  in  Israel  for  the  year  1999.  The  original 
data  detailed  information  about  the  calls  down  to  the  second.  However,  the  data  used  for  this 
experiment  was  modified  to  only  count  the  number  of  calls  per  day.  During  weekdays  (Sunday 
to  Thursday),  the  call  center  was  staffed  from  7:00  am  to  midnight  local  time.  The  call  center 
closed  at  2:00  pm  on  Friday  and  reopened  at  around  08:00  pm  on  Saturday.  The  automated 
service  operated  7  days  a  week,  24  hours  a  day  [4J. 


4.2.1  Results 

From  the  Summary  of  Findings  table  for  the  bank  call  center  data,  the  most  correlated  rule  was 
Israeli JWorkDay.  The  Z  value  was  271.95,  again  this  shows  a  strong  correlation  of  the  rule 
and  where  the  data  originated,  which  was  in  Israel.  The  next  rule  is  MuslimWeekday,  which 
the  same  as  the  IsrealiWeekday  rule.  Because  of  the  preprocessing  step  for  duplicate  rule  the 
IsrealiWeekday  rule  was  not  examined.  Muslim  WorkDays  is  the  next  and  is  closely  bound  to 
the  Israeli  data  because  they  have  the  same  workweek  rule.  The  next  two  rules  on  the  list  are 
Saturday  and  Friday.  Note  both  of  these  rules  have  negative  estimate  valuing,  meaning  there 
is  a  negative  correlation  to  the  data.  This  makes  sense  as  the  Israeli  weekend  is  Friday  and 
Saturday  so  one  would  expect  the  calling  activity  to  show  a  negative  correlation. 
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Table  4.2:  Summary  of  Findings — Bank  Data — Every  Rule 
Independent 


Rule  Name 

Estimate 

Std.  Error 

Z  Value 

Pr(>  z ) 

IsraclLWorkDays 

1.3061 

0.0048 

271.95 

0.0000 

MuslimWeekday 

1.4268 

0.0053 

270 

0.0000 

Muslim  Work  Days 

1.3182 

0.0049 

266.3 

0.0000 

Saturday 

-1.9038 

0.0098 

194.87 

0.0000 

Friday 

-0.9061 

0.0061 

148.11 

0.0000 

Sunday 

0.3806 

0.0038 

100.64 

0.0000 

We  stern  Weekday 

0.3473 

0.0036 

95.86 

0.0000 

USJWorkDays 

0.2766 

0.0034 

81.48 

0.0000 

Tuesday 

0.2967 

0.0039 

76.43 

0.0000 

Thursday 

0.2744 

0.0039 

70.2 

0.0000 

Monday 

0.2709 

0.0039 

69.22 

0.0000 

Wednesday 

0.2453 

0.0039 

62.15 

0.0000 

DSTRule.  America. Goose  _Bay. 

-1.821 

0.0503 

36.22 

0.0000 

DSTRule.  America.  Asuncion. 

-1.7669 

0.0489 

36.11 

0.0000 

DSTRule.America. Santiago. 

-1.7645 

0.0489 

36.1 

0.0000 

DSTRule.America.Araguaina. 

-1.7065 

0.0475 

35.94 

0.0000 

DSTRule.  Atlantic. Stanley. 

-2.3305 

0.0648 

35.94 

0.0000 

DSTRule.  America. Godthab. 

-1.6776 

0.0468 

35.84 

0.0000 

DSTRule.Asia.Baghdad. 

-1.3499 

0.0397 

33.96 

0.0000 

January 

-0.1928 

0.0058 

33.03 

0.0000 

September 

-0.1648 

0.0059 

28.14 

0.0000 

US  JB  imonthly  Pay 

-0.7978 

0.0302 

26.43 

0.0000 

U  S  JndependanceDay  Observed 

0.5622 

0.0217 

25.9 

0.0000 

DSTRule.  America.  Argentina  .B  uenos 

.-  -1.6688 

0.066 

25.3 

0.0000 

Aires. 

April 

-0.1422 

0.0058 

24.52 

0.0000 

DSTRule.  Antarctica.McMurdo. 

0.4049 

0.0166 

24.33 

0.0000 

August 

0.1193 

0.0051 

23.29 

0.0000 

DSTRule.  Asia.Tehran. 

0.3737 

0.0169 

22.11 

0.0000 

Bank  Data  -  Continued  on  next  page 
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Table  4.2 — Summary  of  Findings — Bank  Data — Continued 


Rule  Name 

Estimate 

Std.  Error 

Z  Value 

Pr(>  z ) 

US_MemorialDay 

0.243 

0.0115 

21.18 

0.0000 

DSTRule.  Africa  .Windhoek. 

0.3456 

0.0171 

20.17 

0.0000 

US  JVeteriansDay  Observed 

0.4332 

0.0231 

18.72 

0.0000 

DSTRule.  Australia.Currie. 

0.3188 

0.0174 

18.37 

0.0000 

US.ChristmasDayObserved 

-0.7159 

0.041 

17.46 

0.0000 

US  New  Years 

-0.7092 

0.0409 

17.36 

0.0000 

DSTRule.  America.  Adak. 

0.3037 

0.0175 

17.36 

0.0000 

October 

-0.094 

0.0056 

16.8 

0.0000 

DSTRule.  Asia.  Jerusalem. 

-0.4033 

0.0248 

16.25 

0.0000 

DSTRule.  Asia.Damascus. 

-0.3996 

0.0248 

16.13 

0.0000 

DSTRule.  Africa.Ceuta. 

0.2786 

0.0177 

15.73 

0.0000 

DSTRule.Pacific.Tongatapu. 

0.2789 

0.025 

11.17 

0.0000 

US-ColumbusDay 

0.2783 

0.025 

11.14 

0.0000 

DSTRule.  Asia.  Amman. 

-0.2479 

0.023 

10.79 

0.0000 

DSTRule.America.Havana. 

-0.2285 

0.0228 

10.04 

0.0000 

U  S  JMartinLuther 

0.253 

0.0253 

10 

0.0000 

May 

0.0512 

0.0053 

9.71 

0.0000 

US_LaborDay 

0.2297 

0.0256 

8.97 

0.0000 

DSTRule.  Asia. Gaza. 

0.1578 

0.0188 

8.4 

0.0000 

U S  WashingtonBDay 

0.2158 

0.0258 

8.37 

0.0000 

June 

0.0412 

0.0054 

7.67 

0.0000 

July 

0.0363 

0.0053 

6.86 

0.0000 

DSTRule.Pacific.Fiji. 

0.1255 

0.0191 

6.57 

0.0000 

March 

0.0301 

0.0053 

5.67 

0.0000 

U  S  .Thanksgiving 

0.1385 

0.0268 

5.17 

0.0000 

February 

-0.0241 

0.0057 

4.23 

0.0000 

DSTRule.  Africa. Cairo. 

0.0573 

0.0198 

2.9 

0.0037 

Again,  the  RBI  methodology  showed  the  tight  binding  to  the  Israeli  holiday  and  a  decreased 
correlation  to  the  Israeli  weekend.  In  this  case,  the  RBI  methodology  seems  to  be  not  only 
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explain  what  rules  correlate  but  also  show  which  strongly  do  not  correlate.  From  an  analysis 
perspective,  a  negative  correlation  can  be  just  as  informative  as  a  strong  positive  correlation. 

4.3  Global  Terrorism  Database 

The  Global  Terrorism  Database  (GTD)  is  a  open-source  database  with  records  starting  in  1970 
and  ending  in  2007.  There  are  over  80000  events  in  the  database  and  every  event  includes  where, 
when,  and  how  each  event  occurred.  The  recorded  data  was  derived  from  open-source  material 
such  as  books,  journals,  and  legal  documents.  The  data  from  1970-1997  was  collected  by  the 
Pinkerton  Global  Intelligence  Services  (PGIS)-a  private  security  agency.  Cases  between  1998 
and  2007  were  developed  from  a  partnership  from  Center  for  Terrorism  and  Intelligence  Studies 
(CETIS),  and  the  Study  of  Terrorism  and  Responses  to  Terrorism  (START)  groups.  Additional 
events  were  added  from  the  Conflict  Archive  on  the  Internet;  the  Australian  Turkish  Media 
Group,  Armenian  Terrorism:  The  Past,  Present,  the  Prospects,  by  Francis  Hyland;  the  Nation 
Abortion  Federation;  and  the  Further  Submission  and  Responses  by  the  ANC  to  Questions 
Raised  by  the  Commission  for  Truth  and  Reconciliation  5/12/97. 


Global  Terrorism  Database  Data 

20 


Figure  4.3:  Global  Terrorism  Database  Data — Number  of  Attacks  Per  Day 


4.3.1  Results 

The  GTD  data  is  the  control  dataset.  There  were  no  rules  created  or  designed  for  this  data. 
If  the  control  is  correct,  there  are  no  rules  that  standout  as  particularly  strong  for  this  data. 
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When  compared  to  the  other  dataset,  results  the  top  rules  both  had  Z  scores  over  200.  This  is 
reasonable  of  the  incidents  in  1999  32%  of  the  attacks  happened  in  Algeria,  Columbia,  India, 
and  Turkey.  None  of  these  countries  workweeks  are  coded  in  the  rule  set.  Twenty-three  percent 
of  the  attacks  happened  in  the  Middle  East  and  Northern  Africa  this  region  is  predominately 
Muslim;  however,  it  is  not  reasonable  to  believe  global  or  even  regional  terrorist  activity  follows 
a  regular  workweek  pattern  unless  it  is  to  target  populated  areas.  Again,  this  would  not  follow 
a  holiday  schedule,  per  say,  and  this  is  reflected  in  the  low  Z  scores. 

The  top  rule  is  June  with  a  Z  value  of  7.31,  which  is  much  different  than  the  200  seen  in  the 
other  rules.  This  is  interesting  by  looking  at  the  graph  both  the  months  of  June  and  March 
show  more  consistent  activity  4.3.  The  second  rule  DSTRule.  America.  Argentina.  Buenos  Aires 
is  interesting  and  points  to  a  important  issue  when  using  automated  systems  such  as  discussed 
in  this  thesis.  It  seems  in  1999  several  countries  adopted  a  standard  daylight  saving  change  in 
the  later  part  of  the  year.  The  DSTRule. America. Argentina. Buenos ^Aires  is  an  example  of  this. 
This  rule  has  only  one  day  change  and  it  is  on  10/2/99,  which  also  happens  to  correspond  to  the 
high  spike  shown  on  the  graph.  If  an  analysis  is  not  paying  attention,  he  or  she  might  think  there 
is  a  correlation  to  global  terrorist  attacks  and  daylight  saving  in  Argentina.  When  in  fact  this  is 
an  anomaly,  as  it  is  a  rule  with  only  one  day.  The  likelihood  of  random  rules  showing  a  high 
correlation  decrease  significantly  as  the  rules  become  more  complex  (i.e.,  more  rules  than  one). 
The  reason  similar  activity  was  not  seen  in  the  other  datasets  is  easily  explained;  the  temporal 
rules  created  were  created  with  these  datasets  in  mind  so  they  should  have  high  Z  values  and 
over  shadow  a  single  day  with  a  high  spike. 

Table  4.3:  Summary  of  Findings — GTD  Data — Every  Rule 

Independent 


Rule  Name 

Estimate 

Std.  Error 

Z  Value 

Pr(>  z ) 

June 

0.5834 

0.0799 

7.31 

0.0000 

DSTRule.  America.  Argentina  .B  uenos 

1.6189 

0.2373 

6.82 

0.0000 

Aires. 

March 

0.5089 

0.081 

6.28 

0.0000 

DSTRule.  America.  Asuncion. 

1.257 

0.2019 

6.22 

0.0000 

January 

-0.6298 

0.1301 

4.84 

0.0000 

DSTRule.  America.  Araguaina. 

1.03 

0.2253 

4.57 

0.0000 

GTD  Data  -  Continued  on  next  page 
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Table  4.3 — Summary  of  Findings — GTD  Data — Continued 


Rule  Name 

Estimate 

Std.  Error 

Z  Value 

Pr(>  z ) 

July 

0.3708 

0.0853 

4.35 

0.0000 

February 

-0.5711 

0.1332 

4.29 

0.0000 

September 

-0.4815 

0.1237 

3.89 

0.0001 

October 

0.2727 

0.0886 

3.08 

0.0021 

U  S  .Thanksgiving 

0.9189 

0.3345 

2.75 

0.0060 

DSTRule.Asia.Baghdad. 

0.6687 

0.2687 

2.49 

0.0128 

DSTRule.Australia.Currie. 

0.6687 

0.2687 

2.49 

0.0128 

DSTRule.Pacific.Fiji. 

0.6687 

0.2687 

2.49 

0.0128 

May 

-0.2465 

0.1098 

2.25 

0.0247 

April 

-0.2475 

0.1115 

2.22 

0.0265 

DSTRule.Antarctica.McMurdo. 

0.5938 

0.2787 

2.13 

0.0331 

Saturday 

0.1505 

0.0748 

2.01 

0.0444 

DSTRule.Africa.Cairo. 

-1.9803 

1.0004 

1.98 

0.0477 

DSTRule.  Atlantic. Stanley. 

-1.2864 

0.7076 

1.82 

0.0691 

DSTRule.  Asia. Gaza. 

0.513 

0.29 

1.77 

0.0769 

Muslim  Work  Days 

-0.0926 

0.0592 

1.57 

0.1174 

We  stern  Weekday 

-0.0916 

0.0599 

1.53 

0.1265 

DSTRule.  America.  Adak. 

-0.8802 

0.578 

1.52 

0.1278 

MuslimWeekday 

-0.0889 

0.0598 

1.49 

0.1371 

Israeli  WorkDays 

-0.0835 

0.0586 

1.43 

0.1540 

U  S  C  h  r i  s t  m  a s  D ay 0  b  s crvc d 

-1.2844 

1.0004 

1.28 

0.1992 

U  S  JndependanceDay  Observed 

0.5111 

0.4092 

1.25 

0.2116 

US  JVeteriansDay  Observed 

0.5111 

0.4092 

1.25 

0.2116 

US  _B  imonthly  Pay 

-0.5918 

0.5008 

1.18 

0.2373 

DSTRule.  Africa.  Windhoek. 

-0.5918 

0.5008 

1.18 

0.2373 

US  .WorkDays 

-0.0637 

0.0583 

1.09 

0.2743 

DSTRule.America. Santiago. 

0.3291 

0.3174 

1.04 

0.2998 

US  JViemorialDay 

-0.2559 

0.2687 

0.95 

0.3408 

Monday 

-0.0668 

0.0808 

0.83 

0.4084 

US  JMew  Years 

-0.5905 

0.7076 

0.83 

0.4040 

Wednesday 

-0.0538 

0.0804 

0.67 

0.5034 

GTD  Data  -  Continued  on  next  page 

38 


Table  4.3 — Summary  of  Findings — GTD  Data — Continued 


Rule  Name 

Estimate 

Std.  Error 

Z  Value 

Pr(>  z ) 

August 

-0.0681 

0.1017 

0.67 

0.5033 

DSTRule.Africa.Ceuta. 

0.223 

0.3345 

0.67 

0.5050 

DSTRule.  Asia.  Amman. 

0.223 

0.3345 

0.67 

0.5050 

DSTRule.Asia.Tehran. 

0.223 

0.3345 

0.67 

0.5050 

Thursday 

-0.0474 

0.0802 

0.59 

0.5550 

DSTRule.  America.Havana. 

-0.1848 

0.4092 

0.45 

0.6516 

DSTRule.  Asia.Damascus. 

-0.1848 

0.4092 

0.45 

0.6516 

DSTRule.  Asia  Jerusalem. 

-0.1848 

0.4092 

0.45 

0.6516 

US_LaborDay 

-0.1843 

0.578 

0.32 

0.7498 

US_ColumbusDay 

-0.1843 

0.578 

0.32 

0.7498 

DSTRule.Pacific.Tongatapu. 

-0.1843 

0.578 

0.32 

0.7498 

DSTRule.  America. Godthab. 

0.1044 

0.3546 

0.29 

0.7684 

DSTRule.  America. Goose  JBay. 

0.1044 

0.3546 

0.29 

0.7684 

Tuesday 

0.0156 

0.0784 

0.2 

0.8428 

Friday 

-0.0067 

0.0784 

0.09 

0.9320 

Sunday 

-0.003 

0.079 

0.04 

0.9695 

U  S  JVlartinLuther 

-14.5878 

469.3236 

0.03 

0.9752 

U  S  _WashingtonBDay 

-14.5878 

469.3236 

0.03 

0.9752 
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CHAPTER  5: 
Future  Work 


This  thesis  has  shown  a  simple,  reliable  methodology  to  assist  analysts  correlate  temporal  rules 
to  datasets  of  interest.  This  supervised  learn  techinique  is  straightforward  and  easy  to  code.  The 
integration  of  free  statical  tools  like  R  are  available  to  anyone  with  internet  connectivity  and  can 
be  used  with  a  little  research.  There  are  many  relevant  and  useful  research  questions  that  are  left 
unanswered.  Future  work  should  involve  these  areas  where  this  methodology  can  be  expanded. 

The  first  most  obvious  is  to  test  different  data  using  different  temporal  granularity.  For  example, 
the  Israeli  bank  data  has  phone  records  down  to  the  second;  it  would  be  interesting  to  see  if  you 
can  local  i/e  the  phone  activity  based  on  the  observation  of  local  sunset.  Rules  could  be  created 
to  localize  the  celestial  observation  down  to  the  tens  of  minutes.  Muslim  and  Jewish  holidays 
both  start  based  on  local  celestial  observation.  As  stated  in  Chapter  1,  this  is  of  interest  because 
the  observed  phases  of  the  moon  or  cycles  of  the  sun  are  different  depending  on  the  observers 
location  on  the  Earth,  these  can  change  the  observation  of  the  holiday  by  minutes  or  even  days. 
In  some  cases  these  differences  can  be  used  to  localize  not  only  what  country,  but  where  in  a 
country.  The  rules  for  this  thesis  do  not  encompass  all  cultures,  nor  do  they  account  for  phases 
of  the  moon  or  observed  sunrise  or  sunset,  but  they  could.  The  rules  are  left  to  the  imagination 
of  the  analysts.  Building  these  rules  into  the  program  would  provide  more  flexibility  and  allow 
researchers  to  test  different  temporal  rules. 

Another  application  might  be  rules  for  operational  tempo.  Operating  norms  of  critical  enemy 
units  could  be  helpful.  For  example,  subordinate  units  often  have  to  report  their  daily  SITREP 
earlier  to  higher  echelon  commands.  Therefore,  identifying  when  the  lower  echelon  command 
report  might  help  in  understanding  the  command’s  location  in  a  military  hierarchy  or  alert  ana¬ 
lysts  of  abnormal  behavior. 

The  data  types  used  in  the  study  where  only  small  samples  of  different  data  domains.  NPS  has 
a  hard  drive  corpus  of  third  world  used  drives.  This  corpus  consists  of  several  terabytes  of  data. 
By  using  the  rules  it  should  be  possible  to  categorize  the  location  were  the  data  was  created 
using  only  temporal  data  found  on  the  drives  (i.e.,  holiday  activity  for  the  different  counties). 
Some  of  the  drives  were  collected  from  Spain  and  Mexico  which  have  siestas.  Rules  created  to 
find  this  should  be  able  to  further  increase  the  confidence  of  original  area  of  creation. 
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Another  possible  test  would  be  to  identify  common  inconsistencies  of  computer  logs  based  on 
location  and  time.  Rules  may  help  find  changes  in  log  data  caused  by  the  computer  changing 
the  local  time  zone  based  on  location.  These  changes  in  the  logs  could  be  tested  on  laptops  to 
see  if  temporal  analysis  can  used  to  determine  the  most  probable  correct  time  thereby  identifing 
the  computers  correct  location.  This  is  an  area  of  extreme  interest  in  the  fields  of  computer 
security  and  forensics. 

Yet  another  possibility  would  be  to  combine  data  collected  from  different  sources  (i.e.,  e-mail 
timestamps,  chat,  Web  surfing  history,  and  installed  programs)  to  predict  future  activity  or  to 
classify  a  users  persona  (i.e.,  terrorist,  hacker,  criminal,  lawyer,  etc.)  or  uses  of  the  machine. 
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CHAPTER  6: 
Conclusions 


This  paper  served  three  valuable  proposes.  One,  it  tested  the  Poisson  regression  for  rule  based 
data  correlation.  Two,  it  demonstrated  a  current  capability  to  combine  human  intuition  of  tem¬ 
poral  events  and  the  speed  of  computers.  Finally,  this  thesis  showed  the  utility  of  the  RBI 
methodology  and  gave  possible  areas  of  future  research. 

The  results  of  the  methodology  are  extensible  and  replicatable  to  other  forms  of  regression 
analysis,  not  just  Poisson  distributional  data.  Additionally,  the  methodology  can  be  used  as  a 
framework  to  explore  different  datasets,  rules,  and  temporal  granularity. 
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APPENDIX  A: 
Time  Rules  Code 


# ! / usr / bin / python 
”””  time  -rules  .  py  : 


Listing  A.1 :  Time  Rules  Code 


This  module  contains  a  list  of  time  rules. 

Each  rule  is  a  function  that  takes  a  python  time  object  and  returns 
True  (in  rule  )  or  False  (not  in  rule  ). 


Design  : 

rule-list  —  an  array  that  has  all  of  the  rules 

apply _rules ( tval )  —  applies  all  of  the  rules  and  returns  an  array  of 
True/False  values  ap  p  ly  _r  ul  e  s -t  o  _c  sv -fi  l  e  ( f  i  l  e  )  —  reads  a  csv  file  in  the 
form  ’  timestamp  ,  count  ’  and  returns  an  array  of  elements  ,  each  in  the  form: 
[timestamp  ,  count  ,  rl  ,  r2  ,  r3  .  .  .  ] 


import  datetime 
import  time 
import  calendar 
import  csv 
import  os 
import  sy s 

#  rule-list  is  the  array  that  we  will  use  to  hold  the  instances  of  all  the  rules 

#  that  are  being  analyzed 
rule.list  =  [] 

class  TimeRule  : 

DESCRIPTION  =  ”  Abstractor  ule^Class.” 
def  inRule  (  self  ,  tval  ) : 

”””The  default  inRule  is  that  it  is  never  in  the  rule, 
tval  is  in  local  time .  ””” 

return  False 
def  __str__(self): 

return  ”%s ”  %  (  s e If  .  _ _c  1  as  s  _ _  .  __name__ ) 
class  Almanac  ( TimeRule  ) : 

”””  An  almanac  is  a  rule  which  returns  true  if  the  given  day  is  part  of  a 
set.  Days  is  a  list  of  datetime  .  date  objects.’””’ 

def  _ init _ ( self  , name .days , ignore_year  =  True): 

self. name  =  name 
self . days  =  days 
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self.ignore_year  =  ignore_year 
def  inRule  (  self  ,  tval  ) : 
for  d  in  self. days  : 

if  tval.tm_mon  ==  d .  month  and  tval.tm_mday 
if  self . ignore.year :  return  True 
if  tval . tm_year  ==  d.year:  return  True 
return  False 
def  __str__(self): 

return  ”Almanac(%s)”  %  (self,  name) 


d  .  day  : 


from  datetime  import  date 
class  Monday  (  TimeRule  ) : 

””” Is  True  if  it  is  a  Monday  in  the  Gregorian  Calendar””” 
def  inRule  (  self  self  ,  tval  ) : 

return  tval.tm_wday  ==  0 
r  u  1  e  _  1  i  s  t  .  append  (Monday  ( ) ) 

class  Tuesday  (  TimeRule  ) : 

Is  True  if  it  is  a  Tuesday  in  the  Gregorian  Calendar””” 
def  inRule  (  self  self  ,  tval  ) : 

return  tval.tm_wday  ==  1 
rule_list  .append(TuesdayO) 

class  Wednesday  (  TimeRule  ) : 

”””  Is  True  if  it  is  a  Wednesday  in  the  Gregorian  Calendar””” 
def  i n R u 1 e ( selfself, tval): 

return  tval.tm_wday  ==  2 
r  u  1  e  _  1  i  s  t  .  append  (Wednesday  ( ) ) 

class  Thursday  (  TimeRule  ) : 

”””  Is  True  if  it  is  a  Thursday  in  the  Gregorian  Calendar””” 
def  inRule  (  self  self  ,  tval  ) : 
return  tval.tm_wday  ==  3 
r  u  1  e  _  1  i  s  t  .append(ThursdayO) 

class  Friday  (  TimeRule  ) : 

””” Is  True  if  it  is  a  Friday  in  the  Gregorian  Calendar””” 
def  inRule  (  self  self  ,  tval  ) : 
return  tval.tm_wday  ==  4 
r  u  1  e  _  1  i  s  t  .append(FridayO) 

class  Saturday  (  TimeRule  ) : 

””” Is  True  if  it  is  a  MoSaturdaynday  in  the  Gregorian  Calendar’ 
def  inRule  (  sel f  s el f  ,  tval  ) : 

return  tval.tm_wday  ==  5 
rule_list .append! Saturday ()) 


class  Sunday  (  TimeRule  ) : 

””” Is  True  if  it  is  a  Sunday  in  the  Gregorian  Calendar ' 
def  inRule  (selfself,  tval): 
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return  tval.tm_wday  ==  6 
r  u 1 e  _ 1 i s  t  . append ( Sunday  ( ) ) 

class  January  (  TimeRule  ) : 

”””  Is  True  if  it  is  January 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  1 
r  u 1 e  _ 1 i s  t  . append ( January  ( ) ) 

class  February  (  TimeRule  ) : 

”””  Is  True  if  it  is  February 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  2 
rule_list  .  append  (FebruaryQ) 

class  March  (  TimeRule  ) : 

”””  Is  True  if  it  is  March  ”” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  3 
r  u  1  e  _  1  i  s  t  .  append  ( March  ( ) ) 

class  April  (  TimeRule  ) : 

js  True  if  it  is  April  ”” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  4 
rule-list  .  append  (  April  ( ) ) 

class  May(  TimeRule  ) : 

”””  Is  True  if  it  is  May  ””” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  5 
r  u  1  e  _  1  i  s  t  .  append  (  May  ( ) ) 

class  June  (  TimeRule  ) : 

”””  Is  True  if  it  is  June  ””” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  6 
rule.list  . append ( June  ( ) ) 

class  July  (  TimeRule  ) : 

”””  Is  True  if  it  is  July  ””” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  7 
rule-list  . append ( July  () ) 

class  August  ( TimeRule  ) : 

”””  Is  True  if  it  is  August  ” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  8 
rule-list  .append(August()) 


class  September  (  TimeRule  ) : 

”””  Is  True  if  it  is  September  ””” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tmmon  ==  9 
r  u  1  e  _  1  i  s  t  .append(  September  ( ) ) 

class  October  (  TimeRule  ) : 

”””  Is  True  if  it  is  October  ””” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  10 
rule.list . append ( October () ) 

class  November !  TimeRule  ) : 

”””  Is  True  if  it  is  November  ””” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  1 
rule.list  .append(  November  ( ) ) 

class  December  (  TimeRule  ) : 

”””  Is  True  if  it  is  December  ””” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_mon  ==  1 
rule_list  .append!  December  ( ) ) 

class  WesternWeekday  (  TimeRule  ) : 

”””  Westernworkdays  are  Monday  through  Friday”’ 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_wday  in  [0,1, 2, 3, 4] 
rule.list  .append!  WesternWeekday  ( ) ) 

class  MuslimWeekday  ( TimeRule  ) : 

””” MuslimWeekdays  are  Sunday  through  Thursday  ’ 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_wday  in  [6, 0,1, 2, 3] 
r  u  1  e  _  1  i  s  t  .append!  MuslimWeekday  ( ) ) 


class  Israeli  Weekday!  TimeRule  ) : 

”  ”  ”  Is  r  aeliW  eekday  s  are  Sunday  through  Thursday  ” 
def  inRule  (  self  ,  tval  ) : 

return  tval.tm_wday  in  [6, 0,1, 2, 3] 
r  u  1  e  _  1  i  s  t  .  append(IsraeliWeekday  ()) 

class  US.NewYears  (  TimeRule  ) : 

””” NewYears  occurs  the  first  day  of  every  year” 
def  inRule(self  ,  tval  ): 

return  tval.tm_yday  ==  1 
rule_list  .append!  US  _New Years  ( ) ) 

class  US  _MartinLuther  (  TimeRule  ) : 

”””  Martin  Luther  occurs  on  the  third  Monday  in 
def  inRule  (  self  ,  tval  ) : 


January 
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return  ( ( t v al .  tm_mon  ==  1  and  tval.tm_wday  ==  0  and  tval.tm_yday  in  \ 
range (15  ,21))) 

r  u  1  e  _  1  i  s  t  .  append  (  US  _MartinLuther  ( ) ) 
class  US_WashingtonBDay  (  TimeRule  ) : 

”””  Washington  ’ s  Birthday  occurs  on  the  third  Monday  in  February  ””” 
def  inRule  (  self  ,  tval  ) : 

return  (( tval .  tm_mon  ==  2  and  tval.tm_wday  ==  0  and  tval.tm_mday  in  \ 
range (15  ,21))) 

r  u  1  e  _  1  i  s  t  .append  ( US  _ Washing tonB Day  ( ) ) 

class  US _MemorialDay  (  TimeRule  ) : 

”””  Memorial  Day  occures  on  the  last  Monday  in  May  ””” 
def  inRule  (  self  ,  tval  ) : 

return  ( (  t v al  .  tm_mon  ==  5  and  tval.tm_wday  ==  0  and 
((31  —  t val  .  tm_mday)%7  ==  0))) 
r  u  1  e  _  1  i  s  t  .append  ( US  _MemorialDay  ( ) ) 

class  US  _Independance  Day  Ob  served  (  TimeRule  ) : 

”””  Independence  Day  occures  on  July  4  if  falls  on  Sunday  then  observed  on 
Monday  if  it  falls  on  Saturday  then  it  is  observed  on  Friday  ””” 
def  inRule  (  self  ,  tval  ) : 

return  ( ( t v al .  tm_mon  ==  7  and  tval.tm_mday  ==  4  and  tval.tm_wday  <  5) 

or 

(tval.tm_mon  ==  7  and  tval.tm_mday  ==  3  and  tval.tm_wday  ==  4) 

or 

(tval.tm_mon  ==  7  and  tval.tm_mday  ==  5  and  tval.tm_wday  ==  0)) 
rule.list  ,append(US  _Independance  Day  Ob  served  ( ) ) 

class  US_LaborDay  (  TimeRule  ) : 

”””  Labor  Day  occures  on  the  first  Monday  in  September  ””” 
def  inRule  (  self  ,  tval  ) : 

return  ( (  t v al  .  tm_mon  ==  9  and  tval.tm_wday  ==  0  and  tval.tm_mday  in  \ 
range  (1  ,7))) 

r  u 1 e  _ 1 i s  t  . append  ( US_LaborDay ( ) ) 

class  US_ColumbusDay  (  TimeRule  ) : 

”””  Columbus  Day  occures  on  the  second  Monday  in  October  ””” 
def  inRule  (  self  ,  tval  ) : 

return  ((  tval  .  tm_mon  ==  10  and  tval.tm_wday  ==  0  and  tval.tm_mday  in  \ 
range  (7, 14))) 

r  u  1  e  _  1  i  s  t  .append  ( US_ColumbusDay  ( ) ) 

class  US  _  Ve  teri  an  s  Day  Ob  served  (  TimeRule  ) : 

”””  Veterans  Day  occures  on  November  11th  ””” 
def  inRule  (  self  ,  tval  ) : 

return  ( (  t v al  .  tm_mon  ==  11  and  tval.tm_mday  ==  11  and  tval.tm_wday  <  5) 

or 

(tval.tm_mon  ==  11  and  tval.tm_mday  ==  10  and  tval.tm_wday  ==  4) 

or 

(tval.tm_mon  ==  11  and  tval.tm_mday  ==  12  and  tval.tm_wday  ==  \ 
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0)) 

rule_list  .append(US  _  Ve  teri  an  s  Day  Ob  served  ( ) ) 

class  U  S -Thanksgiving  (  Time  Rule  ) : 

”””  Thanks  giving  occures  on  November  25  th  ””” 
def  inRule  (  self  ,  tval  ) : 

return  ( ( t v al .  tm_mon  ==  11  and  tval.tm_mday  ==  25)) 
rule.list  .  append(US .Thanksgiving  ()) 

class  US  .Christ  mas  Day  Observed  (  TimeRule  ) : 

”””  Christmas  Day  occures  on  December  the  25th  ””” 
def  inRule  (  self  ,  tval  ) : 

return  ( ( t v al .  tm.mon  ==  12  and  tval.tm.mday  ==  11  and  tval.tm.wday  <  5) 

or 

(tval.  tm.mon  ==  12  and  tval.tm.mday  ==  10  and  tval.tm.wday  ==  4) 

or 

(tval.  tm.mon  ==  12  and  tval.tm.mday  ==  12  and  tval.tm.wday  ==  \ 
0)) 

rule.list  .append(US  .Christ  mas  Day  Ob  served  ( ) ) 

class  US.WorkDays  (  TimeRule  ) : 
def  _ init _ ( self  ): 

self.  US.HOLIDAY.LIST  =  [  US.NewYears  ()  ,  US  .MartinLuther  ()  , 
US.WashingtonBDay  ()  ,  US.MemorialDay  ()  ,  US .IndependanceDayObserved  ()  , 
US.LaborDay  ()  ,  US.ColumbusDay  ()  ,  US.VeteriansDayObserved  ()  , 

US  .Thanksgiving  ()  ,  US.ChristmasDayObserved  ()] 
self  .WEEKDAY .RULE  =  WesternWeekday  () 
def  inRule  (  self  ,  tval  ) : 

”””  It  is  not  a  workday  if  it  is  a  US  holiday 

Go  through  the  list  of  US  holidays  and  see  if  any  of  the  rules  match. 

If  so,  return  false’””’ 

for  r  in  self.  US.HOLIDAY.LIST : 

if  r  .  inRule  ( tval  )==True  :  return  False 
#  It  is  a  Workday  if  the  day  is  a  weekday 
return  self  .  WEEKDAY  .RULE.  inRule  ( tval  ) 
rule.list  .append(  US.WorkDays  ( ) ) 

class  US  .Bimonthly  Pay  (  TimeRule  ) : 

”””Pay  is  on  the  1st  and  15th  of  everymonth  when  1st  or  15th  is  on  work 
day” ” ” 

def  inRule  (  self  ,  tval  ) : 

return  ( (  t v al  .  tm.yday  ==  1  and  US.WorkDays  () ) 

or 

( tval  .  tm.yday  ==  15  and  US.WorkDays  ()) ) 
rule.list  .append(  US  _Bi  monthly  Pay  ( ) ) 

#  Use  the  Almanc  class  to  create  a  Jewish -Holiday  s  _7  999  object. 

#  That  object  will  implement  the  rules! 

Je wi sh.Holiday s _  1  999  =  Almanac  (”  Jewish  ^Holidays  ^  1999”  , 

[date(1999,2,l),  #  Tu  Bishvat 
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date (1999 ,3 ,2) , 
date (1999 ,4,1), 
date (1999,4,13), 
date  ( 1 999  ,4 , 2  1 )  , 
date (1999 ,5 ,4) , 
date (1999 ,5 ,21) , 
date (1999 ,7 ,22)  , 
date (1999,9,11), 
date (1999 ,9 ,20) , 
date (1999 ,9 ,25) , 
date (1999,10,2), 
date ( 1999 ,10,3), 
date (1999 ,12  ,4) 

]  ,  True ) 


#  Purim 

#  Pesach 

#  Yoni-HaShoah 

#  Yorri-HaAtzmaut 

#  Lag-BOmer 

#  Shavuot 

#  Tisha -BAv 

#  Rosh _HaShannah 

#  Yom  Kippur 

#  Sukkot 

#  Shemini  _Atzeret 

#  Simhat -Torah 

#  Chanukah 


class  Israeli  _WorkDays  ( TimeRule  ) : 

’’’’’’Return  ’’True”  if  it  is  a  day  on  which  somebody  works  in  Israel.” 
def  _ init _ ( self  ): 

self.IsraelHolidayRule  =  Jewish_Holidays_1999 
self.  WEEKDAY_RULE  =  Israeli  Weekday  () 
def  inRule  (  self  ,  tval  ) : 

#  A  day  is  an  israel  workday  if  it  is  workday  and  not  a  holiday 
return  self  ,WEEKDAY_RULE.  inRule  ( tval  )  and  \ 
self  .  Israel_Holiday_Rule  .  inRule  ( tval  )==False 
r  u  1  e  _  1  i  s  t  .  append(Israeli_WorkDays  ()) 

#  Use  the  Almanc  class  to  create  a  Muslim_Holidays _1999  object. 

#  That  object  will  implement  the  rules! 


Muslim  _Holidays_l  999 
[  date (1999 ,1  ,19) , 
date (1999 ,3  ,28) , 
date (1999,4,17), 
date ( 1999  ,4 ,26) , 
date (1999 ,6 ,26) , 
date (1999 ,12  ,9) 

]  ,  True ) 


=  Almanac  ( ’’Muslim u Holidays  M999”  , 

#  Id  al  —  Fitr 

#  Id  al—Adha 

#  New  Years 

#  Ashura 

#  Mawlid  1 

#  Ramadan 


class  Muslim_WorkDays  ( TimeRule  ) : 

’’’’’’Return  ’’True”  if  it  is  a  day  on  which  somebody  works  in  Muslim  country  . 
def  _ init _ ( self  ): 

self  .  Muslim_Holiday _Rule  =  Muslim_Holidays_1999 
self  .WFEKDAY  R1JTE  =  MuslimWeekday  () 
def  inRule(sell'  ,  tval  ): 

#  A  day  is  an  Muslim  workday  if  it  is  workday  and  not  a  holiday 
return  self  . WEEKDAY_RULE.  inRule  (  tval  )  and  \ 
self  .  Muslim _Holiday_Rule  .  inRule  (  tval  )==False 
rule.list  .append!  Muslim_WorkDays  ( ) ) 

import  d s tru le s 
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class  DSTRule  (  TimeRule  ) : 

’’’’"This  rule  returns  TRUE  if  there  is  a  change  to  DST  or  ST  for  a  given 
timezone  on  a  given  day.””” 

def  _ init _ (  self  ,  timezone  ): 

s e  1  f  .  timezone  =  timezone 
def  inRule  (  self  ,  tval  ) : 

return  dstrules  .  is_change_day  (  self  .  timezone  ,  tval  .  tm_year  ,  tval  .  tm_mon  , \ 
tval  .  tm_mday ) 
def  __str__(self): 

return  ”DSTRule(%s)”  %  s e  1  f  .  timezone 


#  Add  all  of  the  rules  that  we  haven  ’  t  seen  before 
s  e  e  n  _  i  d  s  =  set() 

for  TZ  in  dstrules  .  time_zones  () : 

rulestr  =  str  (  dstrules  .  timezone_id  (TZ)) 
if  rulestr  not  in  seen_ids: 

rule_list  .append(  DSTRule  (TZ) ) 
s  een  _id  s . add ( rulestr) 


#  Our  handy  time—  parser .  Can  parse  any  time!  Really! 
t  i me  _f  o rma  t  _1  i  s  t  =  [ 

”%Y— %n-%d  jm:  %M  %  S  ”  , 

”%waimwdmmz” , 

%M  %  S  ”  , 

”%n/%d/%y  ”  ] 
def  parse_time(s): 

’’’’’’Parse  the  string  s  and  return  a  struct -time  .””” 
for  f  in  time_format_list  : 

try  : 

return  time  .  strptime  (s  ,  f ) 
except  ValueError: 

continue 

def  apply_rules(tval): 

return  [  r  .  inRule  ( tval  )  for  r  in  rule.list] 

if  __name__=="  __main__”  : 

from  optparse  import  OptionParser 
global  options 

parser  =  OptionParser  () 

parser. usage  =  ” usage :  ^%prog ^ [ op tio ns ] u<i np  u  t  f i  1  e  >” 

parser  .  add  _op tion  (  ”—  1  ”  ,  ” —  list  ”,help=”Listurules”,  actio  n=”store_true”) 
parser  .  add  _op tion  (  ”—  t  ”  ,  ” —  test”,help=”TesturulesuwithuatJtime”,\ 
action  =  ”store_true”) 
parser  .  add_option  (”— s”  — show”  ,\ 

help  =  "print^theutimeuruletJinformationtJforua^timezone”) 

parser  .  add_option(  ” — tex  ”  ,help  =  "outputtJrulestJin  „LaTeX„  format  ”  ,\ 
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action  =  ”store_true”) 

(options,  args)  =  parser.  parse_args() 
if  options  .  show  : 

print  dstrules  .  timezone_id  (  options  .  show) 
exit  (0) 

if  options.tex: 

print  ”\\def\  \  Total Rules  {%d}\n”  %  len(rule_list) 
print  ”\\def\\AllMyRules{” 
for  r  in  rule_list: 

print  ” \\ myrule{%s  }  %  (  str  (  r  ).  replace 

print  ”}” 

exit  (0) 

if  options,  list: 

print  ”  There  u  are  J%>du  rul e  s  :  ”  %  ( len  ( r u  1  e _ 1 i s t ) ) 
for  r  in  rule_list: 

print  r 
print 

exit  (0) 

if  options. test: 

tval  =  parse_time(  options,  test) 
print  options  .  test  ,  ” = ”  ,  tval 
result  =  apply  _rules(tval) 
for  i  in  r an ge(len (result)): 

print  rule_list[i], result [i] 
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APPENDIX  B: 
Timeline  Code 


Listing  B.1 :  Timeline  Code 

#  To  change  this  template  ,  choose  Tools  \  Templates 

#  and  open  the  template  in  the  editor. 

#  Please  read  http://seehuhn.de/pages/pdate 

__author  __  =  ”LCDRuKris  „Kearton” 

__date__  =”$Dec .3,^2009^2:51: 34  JPM$” 

from  datetime  import  datetime  ,  timedelta  ,  date 

import  csv 
import  time 
import  os 

def  datetimeIterator(from_date  =  datetime  .  now  ()  ,  to  _d  a  t  e  =None  ) : 
while  to.date  is  None  or  from_date  <  to.date: 
yield  from_date 

from_date  =  from_date  +  timedelta  f  days  =  1) 

return 

def  dict_to_count(d): 
keys  =  d . keys ( ) 
keys . sort () 
ret  =  [] 

for  date  in  keys : 

r e t . append  ((  date  ,  d  [  d  ate  ] ) ) 

return  ret 

def  read _acts (fname  ) : 

’’’’’’Read  the  ACTS  database  from  a  file  and  return  a  dictionary 
where  the  key  is  the  DATE  and  the  value  is  the  count.””” 

ret  =  [] 

for  line  in  cs v . re  ad er ( open f fname ,  ”U”  ) ) : 

when  =  datetime  .  strptime  (  line  [0]  ,  ”%n/%d/%y”) 

count  =  1 i n  e [  1  ] 

ret . append ((  when , count )) 

return  ret 

def  read_phonecenter  f  fname ) : 
headings  =  None 

tally  ={}  #  tally  by  date 

for  line  in  cs  v  .  re  ad  er  (  open  f  fname  ,  ”U”  )  ,  delimiter  =  ”\t”  ): 
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if  not  headings: 

headings  =  line  #  get  the  headings 

continue 

try  : 

#  The  5th  field  is  the  date/time 

when  =  datetime  .  strptime  (  line  [5  ]  ,  ”%y%Md ”)  #  just  use  the  date 
except  ValueError: 

#  In  one  case,  there  were  spaces  instead  of  a  tab,  so  grab  the 
# previous  field 

when  =  datetime  .  strptime  (  line  [4]  ,  ”%y%rfM”)  #  just  use  the  date 


try  : 

tally  [when]  +=  1  #  increment  the  count  for  that  date 

except  KeyError: 

tally  [  when ]  =  1 

return  dict_to_count ( tally ) 

def  read  _gtd  (  fname  ) : 

tally  ={}  #  tally  by  date 

for  line  in  cs v . re  ad er ( open f fname ,  ”U” ) ) : 
if  line  [3]  =  =””: 

continue  #  no  day  in  database 

year  =  int (  line  [  1  ] ) 
month  =  int  (line  [2]) 
day  =  intfline  [3]) 

if  month  ==0:  month=  in t  ( line  [ 0 ]  [ 4  :  6 ] ) 
if  day  ==0:  day=  in t  ( line  [ 0]  [6 :  8 ] ) 
if  year==0  or  month==0  or  day==0: 

continue 

try  : 

when  =  datetime  (  year  ,  month  ,  day ) 
except  ValueError: 

print  ’’bad ^ line  line 

continue 

try  : 

tally  [when]  +=  1  #  increment  the  count  for  that  date 

except  KeyError: 

tally  [  when ]  =  1 
return  dict_to_count ( tally ) 

def  make  .zone  .array  () : 

’’’’’’Return  a  list  of  all  the  timezones  ””” 
print  ’’make^zone  u  array  ” 
timezone_dir=[] 

tz  =  csv . reader ( open (” data / tz _zone _only . tab ”  ,  ”U”),  delimiter  =  ”\t”) 
for  zone  in  tz : 

timezone_dir  .  append  (zone  [2]) 

return  timezone_dir 
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def  c omp are _to _d s t  (  dates_in  ,  zone.array  ): 
print  ”enteringucompare_to_dist” 
flag  =  0 
zone.dict  =  {} 
count  =  0 

zonetemp  =  ’’Empty” 

for  zone  in  zone_array  : 

os . environ [ ’TZ ’ ]  =  zone 
time  .  t  z  s  e  t  () 

for  year  in  range ( 1 999  ,2004) : 
for  month  in  range  (1,1 3): 

for  day  in  range(l  ,28):  #left  out  29,  30,  and  31  as  dates  as  std 
#happens  in  middle  of  month 
for  hour  in  range (0,24): 

dst_flag  =  time  .  localtime  ( time  .  mktime  ((  year  ,  month, \ 
day,  hour,  0  ,0  ,0  ,0  ,  —  1 ))) 
temp  =  dst  flag  [8] 
i f  temp  !=  fl ag  : 

t  =  time  .  strftime  (”%Y— <%m-%d”  ,  dst_flag) 
for  idx  in  range(0,  len(dates_in)): 
if  dates _in [ idx ]== t : 

count  =  count  +  1  #adds  only  time  zones  with 
#5  or  more  hits 
if  count  >  4: 

zo ne _d ic t [ zone ]  =  count 

flag  =  temp 

count  =  0  #resets  count  after  each  zone 
print  zone_dict 
zone  =  zonetemp 

return  z  one  _d  ic  t 

def  eval_rule(rule  .eventArray): 

’’’’’’Given  a  rule  ,  compute  the  number  of  events  that  match  the  rule  and  the 
number  that  don’t.””” 
r  u  1  e  _  i  n  =  0 
r  u  1  e  _  o  u  t  =  0 

for  (day,  count)  in  eventArray: 
if  rule  .  inRule  (day  ) : 

rule.in  +=  count 
else  : 

r  u  1  e  _  o  u  t  +=  count 

print  rule,”  inif;^”  ,  rule  _in  ,”out:u”,rule_out 
if  __name__  ==  ”__main__”: 

’’’’’’starts  here”  ”” 
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#vals  =  re  ad -acts  (”  data  /  acts -C  all  s99  _08  .  csv  ”  ) 

#v  als  =  r  ead  -phonec  ente  r  (  ”  data  /  b  ank  -phonecente  r  -99  .  txt  ”  ) 
vals  =  read -gtd (”  data / g lo bait e rro rismdb -05 09dist .  csv  ” ) 
diet  1999  =  {} 

eventArray  1 999  =  f  i  1 1  e  r  ( lambda  x :  x  [  0  ].  year  =  =  1999 ,  v  als  ) 
for  (day,  count)  in  eventArray  1  999  : 
diet  1  999 [ day .  date  () ]  =  count 

import  time-rules 

arrayWithTimeTuples  =  [] 
poissonArray  =  [] 

timelinedata  =  csv  .  writer  (  open(”  gtd .  csv  ” ,  ”wb”)) 
rows  =  [  ] 

row  =  [  ” Day  Count  ” ] 
for  rule  in  time -  rules,  rule -  list: 
row . append ( s  t  r ( rule  ) ) 

rows  .  append  (  row  ) 

#  Generate  a  list  of  all  the  rows 

for  day  in  d  at  e  t  i  me  1 1  e  rat  o  r  (  date  ( 1  999 , 1  , 1 ) ,  date  (  2  00  0 , 1  , 1 ) ) : 
count  =  diet  1  999 .  get ( day , 0 ) 

arrayWithTimeTuples  .  append  (  (  day  .  t i me  t  up  l e  ( )  ,  count  ) ) 

row  =  [day,  count] 

for  rule  in  time -  rules,  rule-list: 

if  rule,  in  Rule  (  day  .  time  tuple  ( ) ) : 

row  .  append  ( 1 ) 
else  : 

row  .  append  ( 0 ) 
rows  .  append  (  row  ) 

#  Remove  duplicate  columns 

#  This  function  turns  any  column  into  a  string 
def  column2string  (  col -number  ): 

col  =  [] 

for  row  in  rows  [  1 :  ] : 

col .  append  (  str  (  row  [  col -number  ]  )  ) 
return  ” —  ’’.join  (  col ) 

#  Now  we  are  going  to  make  clean-rows  ,  which  is  all  of  the 

#  without  duplicate  columns 

max -rows  =  len  (  rows  ) 

max-columns  =  len  (  rows  [0]) 

new -rows  =  [] 


rows 
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for  i  in  range  ( 0 ,  max  .row  s  ) : 
new  .rows  .  append  (  [  ] ) 

seen. columns  =  set() 

for  column  .number  in  range  (0  ,  max.columns  ) : 

column  .string  =  column2  string  (  column  .number ) 
if  column  .string  not  in  seen  .columns : 

#  copy  this  column  over 

for  rownumber  in  range  (0 ,  len  (  rows  )) : 

new  .rows  [  rownumber  ] .  append  (  rows  [  rownumber  ]  [  column  .number  ] ) 
seen  .columns  .  add (  column  .string  ) 

#  Now  write  out  the  cleaned  table 
for  row  in  new. rows : 

ti  me  l  in  e  data  .  w  rite  row  (  row  ) 

’’’’’’ends  here””” 

’’’’’’This  sction  interfaces  with  R  and  calculates  the  Poisson  regression  and 
builds  the  Latex  table””” 
from  rpy2  .  rob j  e  c  t  s  import  r 

r  (  ’  p<— read  .  table  (’’/Users/positiveforcel  /  Documents  /  Office\uProjects  /  Thesis/\ 
uuuTimeline2  /  timeline  /  g  t  d  .  csv”,„sep="  ,”,uheader  U=UTRUE)  ’ ) 
r(’attach(p)’) 
names  =  r  (’  names  ( p )’ ) 
print  names 
summary  Array  =  [  ] 

”””ALL  RULES  ONE  AT  A  TIME””” 

for  i  in  range (2 , len f names )) : 

first  =  ”  r  (’  ml<— glm  ( Count  ~  ” 
last  =  ”,ufamily  =  poisson)’)” 

r  (  ’ml<—  glm(  Count "  ’+names  [  i  ]+  ’  ,  ufamily  =  poisson  )  ’ ) 
r( ’ library  (  xtable) ’) 
out  =  r ( ’x<— xtable (ml) ’ ) 

summary  Array  .  append  (  s  tr  (  r  (  ’  x<—  xt  able  (ml )  ’  ) ) ) 

print  out 
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